aws-nuke stuck waiting for CloudFormation stabilization, because of AWS bug

rebuy-de / aws-nuke

Nuke a whole AWS account and delete all its resources.

https://github.com/ekristen/aws-nuke

MIT License

5.77k stars 723 forks source link

aws-nuke stuck waiting for CloudFormation stabilization, because of AWS bug #476

Closed spellr closed 1 month ago

spellr commented 4 years ago

New behavior introduced in v2.14.0 in the cleaning of CloudFormation made aws-nuke wait for the cloud-formation stack to stabilize before continuing.

Because of AWS bug, CloudFormation stabilizing might be stuck for a very long time: https://forum.serverless.com/t/very-long-delay-when-doing-sls-remove-of-lambda-in-a-vpc/2535 https://github.com/serverless/serverless/issues/5008 This makes aws-nuke be stuck waiting for it to stabilize. This can take 30 minutes and more.

The behavior in v2.13.0 was to fail deleting the CloudFormation. I feel it's a better solution, given AWS's bug, instead of the program be stuck.

svenwltr commented 4 years ago

I guess this was introduced in #424. @tylersouthwick this is intended behavior, right? I think waiting and not-waiting are both wanted behaviors. Therefore we might want to add a feature-flag for this.

tylersouthwick commented 4 years ago

this bug has been fixed by AWS... each lambda version only consumes one eni (irregardless of how many invocations are done) and it is allocated at function creation time, not usage.

I added this since, from our perspective, we would rather aws-nuke wait until the resource is properly cleaned up before moving on.

Sounds like a good use case for a feature-flag

spellr commented 4 years ago

Normally aws-nuke doesn't wait for a resource to be cleaned, but puts it in the "waiting" state

tylersouthwick commented 4 years ago

Maybe that's the real issue to be discussed here; what is the intended experience? Most resources delete immediately; some do not. Are there other resources that go through an async delete process like cloudformation stacks? Maybe a general purpose feature flag would be useful and could apply to all of these types of resources.

We are using it as part of an automated workflow to clean sandbox accounts and want to make sure everything is cleaned when the process terminates.

spellr commented 4 years ago

A lot of resources are deleted asynchronously. They go into the "waiting" state, and aws-nuke checks on them in the following iterations until they're deleted

tylersouthwick commented 4 years ago

currently, the README says that the intended use case is that aws-nuke retries deleting all resources until all specified ones are deleted or until there are only resources with errors left.

I'm not sure if a resource being in a "deleting" state is really an error.

That's where the idea of a feature flag comes up to enable/disable the behavior.

spellr commented 4 years ago

When a resource is in deleting state aws-nuke continues to delete other resources, and goes back to the resource in deleting in the next iteration.

jack1902 commented 4 years ago

Does anyone have a basic template of putting this into lambda to clear their sandbox account?

I'm planning on using this for that exact use case which I feel is common.

Are you running it from an account outside of the sandbox via an assume role?

tylersouthwick commented 4 years ago

@jack1902 we're using ecs fargate tasks to run it and have a role framework setup. We use aws-nuke as a library inside of an application that we have been building

svenwltr commented 4 years ago

Sorry for the late response.

The basic idea is to let aws-nuke poke the deletion process asynchronously and be in the waiting state until the resource is actually gone.

But I understand that it is generally no desirable to wait a very long time until all resource are gone. Especially with things like CloudFront that might take really long to finish the deletion.

spellr commented 4 years ago

So if I understand correctly, we want the CloudFormation stack to be deleted asynchronously while leaving it in the "Waiting" state, and check on it later. This was the old behavior (pre v2.14.0). Let's make the new CloudFormation code behave the same.

tylersouthwick commented 4 years ago

Maybe this is something that could be controlled via a feature flag. The way we're using it expects all resources to either be deleted at the end of execution, or unable to be deleted.

ekristen commented 1 month ago

Closing due to age. Please test v3 at https://github.com/ekristen/aws-nuke, if still an issue, open an issue there. Thank you.

CFS have been massively reworked to include identifying and deleting orphans.

Please see the copy of the notice from the README about the deprecation of this project. Sven was kind enough to grant me access to help triage and close issues and pull requests that have already been addressed in the actively maintained fork. Some additional information is located in the welcome issue for more information.

[!CAUTION] This repository for aws-nuke is no longer being actively maintained. We recommend users to switch to the actively maintained fork of this project at ekristen/aws-nuke. We appreciate all the support and contributions we've received throughout the life of this project. We believe that the fork will continue to provide the functionality and support that you have come to expect from aws-nuke. Please note that this deprecation means we will not be addressing issues, accepting pull requests, or making future releases from this repository. Thank you for your understanding and support.