Just wanted to get your thoughts on dynamically protecting existing and new resources from the cleanup

atqhg23 commented 2 years ago

There are a large number of resources that we need to protect from the cleanup, and while the cleanup has the temp/permanent allowlist to protect resources, it can be cumbersome to add each resource manually through the website.

Here are some of the ideas we had to resolve this issue:

Leverage the existing allowlist process, and set up a separate process that will check for resources that have a specific tag, then format that data into a CSV file that can be inserted as items into the allowlist DynamoDB table
Determine a way to allow wildcards to be inserted into the allowlist file. There are some issues with this approach though for resources like EC2 instances where the instance ID is unique per resource and is used as the resource ID in the allowlist, and for resources where the name is used like lambdas, one of the issues that comes up is that not all our resources that need to be protected follow the same naming convention, which leads me to thinking that using tags may be the best thing to filter by. Could possibly look into adding a tag value in the allowlist dataset that can be used to protect the resources, but I'm thinking this may also require wildcard support in the allowlist so the resource ID can be set to be a wildcard and only the tag values are used to skip those resources.
Update each cleanup script to ignore resources with specific tags.

One of the approaches that we’re currently using at the moment to handle this is using SCPs and permission boundaries to prevent the cleanup role from deleting certain resources. There are a few issues with this approach though:

Not all resources like lambda and S3 support controlling access based on tags.
The cleanup dry run still shows the resources as being deleted.

What are your thoughts on this? Just wanted to bounce off a few ideas to determine a good approach.

mlevit commented 2 years ago

That's an interesting problem.

You can easily add resource en masse by modifying the DynamoDB JSON file. There's no need to use any UIs as you can grap the execution log then reformat that into the JSON format required by DynamoDB. However, this isn't dynamic and requires constant updates.
Wildcards are something that can be added. This will require a change to every cleanup script where the comparison is made. It may have unintended consequences such as if you have two resources, one named my-cluster and the second my-cluster-1, there is no way to target my-cluster and not my-cluster-1 without also specifying if you're performing an exact compare or a wildcard compare (which just adds complexity).
Tags are a really interesting way to add this functionality. Say you allowlist a tag and a tag value. Then any resource with that combination becomes automatically allowlisted. I do like this as it keeps the allowlist clean, however, this requires significant rework to every cleanup script to include tag extraction and checking.

atqhg23 commented 2 years ago

Thanks for the response. One thing I’m thinking about that may possibly be simpler is trying to apply the tag filter directly in the cleanup scripts where it’ll exclude resources that have the tag specified, and this way the comparison / check won’t have to be done with the allowlist since it’s being “hard coded” in the cleanup script themselves if I’m thinking about this correctly.

Would create some redundancy with the tag values needing to be specified in each cleanup script, but the rest of the cleanup functionality would remain the same.

Will give this a go and post any updates here.

mlevit commented 2 years ago

@atqhg23 was playing around with wildcard support this morning. I believe it's working now as intended. If you'd like to test it, you can grab it here https://github.com/servian/aws-auto-cleanup/tree/wildcard-match

It's using Python's fnmatch function which allows simply wildcarding per the below table:

Pattern	Meaning
*	matches everything
?	matches any single character
[seq]	matches any character in seq
[!seq]	matches any character not in seq

It would be great if you could test this out in your environment and let me know what you think.

atqhg23 commented 2 years ago

Thanks for adding this. I tested this successfully with some CloudWatch log groups

mlevit / aws-auto-cleanup

Just wanted to get your thoughts on dynamically protecting existing and new resources from the cleanup #116