Open cyli opened 9 years ago
Discussion from slack:
We can't just fix this by deleting the group whether or not convergence resulted in failure, because we have to clean up what resources we can. For instance: if CLB goes into error, we still need to delete the server, and if we just delete the group when convergence fails, that server would be left orphaned without convergence trying to delete it.
Right now, since draining is not enabled, we issue both the CLB remove nodes commands and the delete server commands simultaneously. Once we enable draining this may not be the case.
Let's say for example that the user force deletes a group, and convergence is going through and cleaning up all the servers.
And CLB is in ERROR state, or some other unrecoverable error comes from CLB. Convergence fails, and the server can't be cleaned up.
The group is already marked DELETING, so it won't show up again in lists. Nothing can trigger convergence on it again.
What do we do in this case?
Also related - @manishtomar points out that this is a cause for NoSuchScalingGroup errors when converging - the converge cycle results in a FAILURE, and so the converger tries to write that state to the database. But the group is in deleting (modify state does not write to deleting groups), so it fails.