What would you like to be added:
Provide a way to abandon a plan.
Using kubectl kudo...
By editing the instance/using the k8s API.
Why is this needed:
If a plan ends up with a step in an ERROR state, it is difficult to resolve e.g.
deploy:
lastUpdatedTimestamp: "2021-01-27T18:37:56Z"
name: deploy
phases:
- name: deploy-init
status: FATAL_ERROR
steps:
- message: ''A transient error when executing task deploy.deploy-base-servers.deploy-appserver.appserver.
Will retry. failed to patch a apps/v1, Kind=StatefulSet governance-im-dev/sdv-appserver:
failed to execute patch: StatefulSet.apps "sdv-appserver" is invalid:
spec: Forbidden: updates to statefulset spec for fields other than ''replicas'',
''template'', and ''updateStrategy'' are forbidden''
name: deploy-database-init
status: ERROR
In the case shown, this could not be recovered from without upgrading to a new version of the operator. The instance could not upgrade to the new version of the operator because the deploy plane was in progress.
Additional details:
In order to get out of the situation, we edited the instance and changed spec.planExecution.status to FATAL_ERROR and were then able to upgrade and resolve the situation.
It is unclear if this was a valid approach and I cannot find any documentation that describes what to do in this situation.
What would happen with no intervention?
I am also unsure if the failing step would happen with no intervention.
Would it have eventually timed out and the plan status become FATAL_ERROR?
If so, after what period of time?
Is that configurable?
UPDATES
Although setting spec.planExecution.status to FATAL_ERROR allows us to upgrade, the status change is not reflected in the plan itself.
Setting the status of the failing step to FATAL_ERROR does not apply when we edit the instance.
What would you like to be added: Provide a way to abandon a plan.
kubectl kudo...
Why is this needed: If a plan ends up with a step in an ERROR state, it is difficult to resolve e.g.
In the case shown, this could not be recovered from without upgrading to a new version of the operator. The instance could not upgrade to the new version of the operator because the deploy plane was in progress.
Additional details: In order to get out of the situation, we edited the instance and changed spec.planExecution.status to FATAL_ERROR and were then able to upgrade and resolve the situation.
It is unclear if this was a valid approach and I cannot find any documentation that describes what to do in this situation.
What would happen with no intervention? I am also unsure if the failing step would happen with no intervention.
UPDATES