Closed christianherweg0807 closed 1 year ago
Hey @christianherweg0807 , thanks for raising this issue!
I agree, the current check we have in place doesn't look correct. I like your suggestion of checking running hours, but I'm not sure if the current aws_vpc_nat_gateway
table has this information, as it's not returned by the NAT gateway APIs.
Do you know of any way, in the console or API, to check for a NAT gateway's running hours, or any other metric that can help indicate if it's in-use or not?
CloudWatch, AWS monitoring service can be used monitor a NAT gateway via information it collects from the specified NAT gateway. This information is collected and presented in readable metrics at 1 minute intervals and are stored for 15 months. We could uses BytesOutToDestination metric to determine if a NAT Gateway is considered unused or not.
e.g. A NAT gateway is considered unused if the value of BytesOutToDestination is 0 for the last 7 days.
@cbruno10 : Is this a proper way, that we could implement in steampipe?
I have a similar issue where the dashboard is reporting my NATG as being stopped, but it is active and working. In querying directly, I see that the "state" from aws_vpc_nat_gateway is "available", and the join on aws_ec2_instance for instance_state is "stopped". The issue is the NATG is not being used by an EC2 instance but is used by Lambda. This may or may not be the right NATG use case or strategy, but it shows a discrepancy in the query. Please advise if I should open a new issue or if this is OK to keep here.
edit: provide the link where AWS recommends the NATG for Lambda access to internet
'This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.'
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days.
@christianherweg0807 @wedwardbeck, I apologize for the lack of communication regarding this matter.
I have investigated various service properties we get from API responses, such as Nat Gateway, Subnet, Route table, and EC2, to determine if they can provide the necessary information. However, could not find a way to map the resources in the associated private subnet, which is actively using the NAT gateway.
We can create a metric table for VPC NAT Gateway based on your suggestion. Reference: GitHub - turbot/steampipe-plugin-aws.
Q. The value of BytesOutToDestination refers to the amount of outbound traffic from the NAT gateway to the destination. However, should we consider it unused if there is no outbound traffic during a specific period, such as the last seven days?
@cbruno10 Could you please share your thoughts on the above?
@bigdatasourav For the BytesOutToDestination
metric, would this metric return > 0 even if the NAT Gateway isn't in use, i.e., is a NAT gateway used in the background somehow even if not in "active" use? If it doesn't reliably return 0 when the NAT gateway is in use, it could be difficult to use reliably.
I would assume in my case that when the lambda is not running, it would show no outbound traffic. Technically the NATG is not in use when the lambdas are not invoked, but the lambdas are assigned to the NATG and depend on it being available.
@cbruno10, BytesOutToDestination metric returns 0 when it is not in use. Tested the scenario with EC2 -
When I use the instances, the BytesOutToDestination metric is > 0; else, it is 0.
Should we create a table for this metric?
@bigdatasourav We have a similar request for adding a new metric in https://github.com/turbot/steampipe-plugin-aws/issues/1829#issuecomment-1632598504, though I'm not sure it would work as there maybe quals issues. If that is still a blocker for using the CloudWatch metric data point tables we have, I think yes, let's go ahead and create the separate table for now.
Hey @christianherweg0807 and @wedwardbeck, We have updated the query of the vpc_nat_gateway_unused control; the query will now check the BytesOutToDestination metric value for the 5-minute intervals for the most recent five days. If the total value is 0, we can say that the nat gateway is unused.
We will release the PR soon; in the mean time request you try the control and share your feedback. Here are the steps you need to follow -
This query tries to find unused nat gateways. In words:
Every NAT Gateway with a subnet_id that is not used by any EC2 instance ist unused.
See: https://github.com/turbot/steampipe-mod-aws-thrifty/blob/81ae62502277011cdb2b1d7b587f1a89c97bbc9b/query/vpc/vpc_nat_gateway_unused.sql#L20 https://github.com/turbot/steampipe-mod-aws-thrifty/blob/81ae62502277011cdb2b1d7b587f1a89c97bbc9b/query/vpc/vpc_nat_gateway_unused.sql#L29
That's imo not correct for multiple reasons:
A solution could be to find NAT gateways without transfer costs, but running hours?