mlevit / aws-auto-cleanup

Programmatically delete AWS resources based on an allowlist and time to live (TTL) settings
MIT License
496 stars 55 forks source link

Security group attached to load balancer shows delete action in execution log when it is in use #90

Closed atqhg23 closed 2 years ago

atqhg23 commented 2 years ago

Describe the bug The execution log is showing that a security group will be deleted even though it is attached to a load balancer.

To Reproduce Steps to reproduce the behavior:

  1. Create a new security group and attach it to a load balancer only
  2. Run the cleanup in dry_run mode
  3. Check the execution log entry for the security group
  4. The action will say delete when it should skip it because it is in use

Expected behavior The action for the security group should say SKIP - IN USE since the security group is in use by a load balancer.

Versions (please complete the following information):

AWS (please complete the following information):

Additional context Add any other context about the problem here.

atqhg23 commented 2 years ago

I see this is not a bug and is done intentionally. Will close this and submit feature request.

mlevit commented 2 years ago

@atqhg23 the major problem with security group cleanup is they can be attached to multiple services such as EC2, RDS, Load Balancer and more. Right now I'm only checking for their attachment to EC2 instances.

I wonder if there's a way of listing all resources currently using the security group.

atqhg23 commented 2 years ago

I'm about to test this, but I'm assuming this would also be the case with elastic IPs (addresses) as well since they can be used by ECS?

mlevit commented 2 years ago

So I found a way. describe_network_interfaces will return all resources used by the security group.

I can modify the code to check this before proceeding with the delete.

atqhg23 commented 2 years ago

Ah looks like this is it, network interfaces cab be used to check for security group use: https://aws.amazon.com/premiumsupport/knowledge-center/ec2-find-security-group-resources/

atqhg23 commented 2 years ago

One thing to point out, I think it's still detecting that they are in use, or it's not able to delete security groups in use.

I have a security group attached to a load balancer right now, and the exec log said it would delete it in dry-run mode. When I turned off dry-run mode and ran the cleanup, the action shows "skip - in use".

mlevit commented 2 years ago

Correct. The dry run mode has no way of knowing if the resource would actually be deleted or not, it's only an assumption. When the script attempts the delete, it'll catch a DependencyViolation exception and record the resource as SKIP - IN USE.

However, I think it would be better to add this new logic using describe_network_interfaces.

atqhg23 commented 2 years ago

ah ok I see. One other thing I just noticed is that when I add a security group to the whitelist, the action appears blank instead of saying "SKIP - WHITELIST".

mlevit commented 2 years ago

Can you raise that as a separate issue? Should be a quick fix.

atqhg23 commented 2 years ago

Yeah, will do.

atqhg23 commented 2 years ago

Question for you. I was testing edge cases like adding a volume attached to an EC2 instance to the whitelist, but keeping the instance off the whitelist to see if the volume would be destroyed when the instance was destroyed.

The execution log shows the volume was skipped because it was part of the whitelist, but it was actually deleted. I'm assuming this happened because the volume had the delete on termination setting on, so this would always happen if that was left on.

Would that be the case?

mlevit commented 2 years ago

Based on AWS's documentation

By default, Amazon EC2 deletes all EBS volumes that were attached when the instance launched. Volumes attached after instance launch continue running.

Can you confirm if yours was launched along with the instance?

atqhg23 commented 2 years ago

Yes, the volumes were launched with the instance and the DeleteOnTermination flag was set to true.

So adding a volume to the whitelist that was launched with an instance (with DeleteOnTermination set to true), would still cause the volume to be deleted if the instance was not whitelisted regardless of whether the volume is in the whitelist or not.

This is kinda complex. I guess it’s a bug, but it’s working as intended, it’s just not displaying the right action.

There would need to be a check to see if the volume has the DeleteOnTermination flag set to true AND if the instance that it is attached to is being terminated to get the right action (I may be overthinking this).

Alternatively, as part of the instance cleanup, it could modify the DeleteOnTermination flag to false for all volumes attached to the instance so that the cleanup of instances and volumes is separate, but not sure if this is a good idea.

mlevit commented 2 years ago

This is not something that is in my control. Terminating the EC2 Instance will automatically terminate any volumes launched alongside the Instance. The app itself evaluates each AWS resource exclusively... hence why your volume was not removed by the app due to the whitelist but did get removed when terminating the EC2 instance.

atqhg23 commented 2 years ago

I looked into this and I think ModifyInstanceAttribute can be used to control this.

The BlockDeviceMapping attribute can modify the DeleteOnTermination setting for the volumes.

BlockDeviceMapping.N Modifies the DeleteOnTermination attribute for volumes that are currently attached. The volume must be owned by the caller. If no value is specified for DeleteOnTermination, the default is true and the volume is deleted when the instance is terminated.

Although, yeah you're right, the app is technically working as it should.