turbot / steampipe-mod-aws-thrifty

Are you a Thrifty AWS dev? This mod checks your AWS accounts for unused and under-utilized resources using Powerpipe and Steampipe.
https://hub.powerpipe.io/mods/turbot/aws_thrifty
Apache License 2.0
98 stars 19 forks source link

vpc_nat_gateway_unused matches every nat gateway #101

Closed christianherweg0807 closed 1 year ago

christianherweg0807 commented 2 years ago

This query tries to find unused nat gateways. In words:

That's imo not correct for multiple reasons:

  1. the subnet_id attached to a NAT gateway is always the public one exclusive attached to the gateway (isn't it?). This means this query will never find any nat.subnet_id = i.subnet_id
  2. It's not only ec2 using VPC´s and NAT gateways...what about lambda, etc?

A solution could be to find NAT gateways without transfer costs, but running hours?

cbruno10 commented 2 years ago

Hey @christianherweg0807 , thanks for raising this issue!

I agree, the current check we have in place doesn't look correct. I like your suggestion of checking running hours, but I'm not sure if the current aws_vpc_nat_gateway table has this information, as it's not returned by the NAT gateway APIs.

Do you know of any way, in the console or API, to check for a NAT gateway's running hours, or any other metric that can help indicate if it's in-use or not?

christianherweg0807 commented 2 years ago

CloudWatch, AWS monitoring service can be used monitor a NAT gateway via information it collects from the specified NAT gateway. This information is collected and presented in readable metrics at 1 minute intervals and are stored for 15 months. We could uses BytesOutToDestination metric to determine if a NAT Gateway is considered unused or not.

e.g. A NAT gateway is considered unused if the value of BytesOutToDestination is 0 for the last 7 days.

christianherweg0807 commented 2 years ago

@cbruno10 : Is this a proper way, that we could implement in steampipe?

wedwardbeck commented 1 year ago

I have a similar issue where the dashboard is reporting my NATG as being stopped, but it is active and working. In querying directly, I see that the "state" from aws_vpc_nat_gateway is "available", and the join on aws_ec2_instance for instance_state is "stopped". The issue is the NATG is not being used by an EC2 instance but is used by Lambda. This may or may not be the right NATG use case or strategy, but it shows a discrepancy in the query. Please advise if I should open a new issue or if this is OK to keep here.

edit: provide the link where AWS recommends the NATG for Lambda access to internet

github-actions[bot] commented 1 year ago

'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.

bigdatasourav commented 1 year ago

@christianherweg0807 @wedwardbeck, I apologize for the lack of communication regarding this matter.

I have investigated various service properties we get from API responses, such as Nat Gateway, Subnet, Route table, and EC2, to determine if they can provide the necessary information. However, could not find a way to map the resources in the associated private subnet, which is actively using the NAT gateway.

We can create a metric table for VPC NAT Gateway based on your suggestion. Reference: GitHub - turbot/steampipe-plugin-aws.

Q. The value of BytesOutToDestination refers to the amount of outbound traffic from the NAT gateway to the destination. However, should we consider it unused if there is no outbound traffic during a specific period, such as the last seven days?

@cbruno10 Could you please share your thoughts on the above?

cbruno10 commented 1 year ago

@bigdatasourav For the BytesOutToDestination metric, would this metric return > 0 even if the NAT Gateway isn't in use, i.e., is a NAT gateway used in the background somehow even if not in "active" use? If it doesn't reliably return 0 when the NAT gateway is in use, it could be difficult to use reliably.

wedwardbeck commented 1 year ago

I would assume in my case that when the lambda is not running, it would show no outbound traffic. Technically the NATG is not in use when the lambdas are not invoked, but the lambdas are assigned to the NATG and depend on it being available.

bigdatasourav commented 1 year ago

@cbruno10, BytesOutToDestination metric returns 0 when it is not in use. Tested the scenario with EC2 -

When I use the instances, the BytesOutToDestination metric is > 0; else, it is 0.

image

Should we create a table for this metric?

cbruno10 commented 1 year ago

@bigdatasourav We have a similar request for adding a new metric in https://github.com/turbot/steampipe-plugin-aws/issues/1829#issuecomment-1632598504, though I'm not sure it would work as there maybe quals issues. If that is still a blocker for using the CloudWatch metric data point tables we have, I think yes, let's go ahead and create the separate table for now.

bigdatasourav commented 1 year ago

https://github.com/turbot/steampipe-plugin-aws/pull/1842

bigdatasourav commented 1 year ago

Hey @christianherweg0807 and @wedwardbeck, We have updated the query of the vpc_nat_gateway_unused control; the query will now check the BytesOutToDestination metric value for the 5-minute intervals for the most recent five days. If the total value is 0, we can say that the nat gateway is unused.

We will release the PR soon; in the mean time request you try the control and share your feedback. Here are the steps you need to follow -