litmuschaos / litmus-go

Apache License 2.0
66 stars 118 forks source link

Revert chaos when error during drain for node-drain experiments #668

Closed Calvinaud closed 11 months ago

Calvinaud commented 11 months ago

What this PR does / why we need it:

This PR add a call to try to revert the chaos in the node-drain experiments when the drain node is failing. For example, this can happens when the node take too long to drain the node and the the timeout is reach and without trying to revert the chaos the nodes stay cordoned.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): fixes #667

Special notes for your reviewer:

Didn't add a check if the node was really cordon before trying an uncordon since uncordon a non cordoned node to not create a error when doing it with kubectl.

Checklist: