org-formation / org-formation-cli

Better than landingzones!
MIT License
1.41k stars 131 forks source link

OrgFormation deletes stack after update rollback failure #455

Closed rene84 closed 1 year ago

rene84 commented 1 year ago

Subject of the issue

If an updated to a stack fails, cloudformation automatically attempts a rollback as configured in the failure options

However, if the rollback fails, the stack ends up in the state UPDATE_ROLLBACK_FAILED. This state cannot be updated anymore and any attempt to do so will results in a 400 error

"errorCode": "ValidationException",
"errorMessage": "Stack:arn:aws:cloudformation:***:***:stack/con-cloudwan/***-8bfc-11ed-bfee-0a8d0cd556ba is in UPDATE_ROLLBACK_FAILED state and can not be updated."

To get the stack in an updateable state, the rollback needs to be retried until finished using continue-update-rollback. Rollback on specific resources can be skipped using resources-to-skip. Orgformation is currently written to attempt a full recreation of the stack when encountering a ValidationException with status UPDATE_ROLLBACK_FAILED see code snippet here.

The expected behavior for org-formation would be to issue a continue-update-rollback skipping the resources that failed to roll back command instead of deleting the stack. That command should in theory always succeed unless other resources fail as well. In case that happens, it should be considered a broken state that requires manual attention rather than automatically attempting to recreate the stack