pulumi / pulumi-kubernetes-operator

A Kubernetes Operator that automates the deployment of Pulumi Stacks
Apache License 2.0
226 stars 54 forks source link

Stack will not be deleted when prerequisite stack is missing #751

Open beffe123 opened 1 week ago

beffe123 commented 1 week ago

What happened?

When you have two stacks where one (stack A) is a prerequsite of the other (stack B), then stack B can't be deleted after stack A has been deleted. In other words, order is not maintained during deletion and it is not fault tolerant.

Example

  1. create stack A (kubectl apply -f stack-a.yaml)
  2. create stack B with stack A as a prerequisite (kubectl apply -f stack-b.yaml)
  3. delete stack A (kubectl delete stack-a)
  4. delete stack B (kubectl delete stack-b)

Output from 'kubectl describe stack-b':

...
Status:
  Conditions:
    Last Transition Time:  2024-11-18T16:11:42Z
    Message:               reconciliation is in progress
    Reason:                NotReadyInProgress
    Status:                False
    Type:                  Ready
    Last Transition Time:  2024-11-18T16:11:42Z
    Message:               unable to fetch prerequisite "stack-a": Stack.pulumi.com "stack-a" not found
    Reason:                PrerequisiteNotSatisfied
    Status:                True
    Type:                  Reconciling
...

Output of pulumi about

n/a

Additional context

Pulumi Operator versions tested:

This is especially a problem when using GitOps tools like Flux and ArgoCD.

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

EronWright commented 6 days ago

Thanks for the report, your findings ring true. You're responsible for the lifecycle of the Stack objects, and nothing prevents you from deleting a prerequisite stack.

Would a possible solution be to use Argo CD sync waves? I believe that, by using two distinct waves, the ordering will be preserved during creation and deletion. Could you give that a try? For example:

apiVersion: pulumi.com/v1
kind: Stack
metadata:
  name: stack-a
  annotations:
    argocd.argoproj.io/sync-wave: "1" 
---
apiVersion: pulumi.com/v1
kind: Stack
metadata:
  name: stack-b
  annotations:
    argocd.argoproj.io/sync-wave: "2" 
spec:
  prerequisites:
    - name: stack-a
beffe123 commented 5 days ago

Hi Eron,

Thanks for your reply. I guess what you suggest for ArgoCD might work, but would make things complicated. Actually we use Flux. Here we could solve it with dependencies, but it is even more complicated and an ugly workaround. It would make code very confusing for handling multiple dependencies.

As a user I would expect that when I configure dependencies, then the operator would respect order also when deleting objects. The pulumi operator currently only respects oder for creation. Why should I rely on another operator for deletion? Then I don't need it in the Pulumi Operator at all.

The question is, why the Pulumi Operator does even check for existence of prerequisite stacks when destroying a stack and cancels the process when prerequisites are missing. From my point of view this is not necessary.

For now we removed the prerequisite completely. Stacks with missing prerequisites are destroyed and for creation, they are retried, until the prerequisite is up.

EronWright commented 5 hours ago

@beffe123 makes a good point:

The question is, why the Pulumi Operator does even check for existence of prerequisite stacks when destroying a stack and cancels the process when prerequisites are missing. From my point of view this is not necessary.

When a Stack is deleted, the behavior varies based on the destroyOnFinalize field. When enabled, a pulumi destroy operation is run during finalization. The dependent blocks because it may be that the prerequisite stack produces outputs or other effects that the dependent relies on. How about we add a flag to the prerequisites block to control whether a given prerequisite should block deletion?

To be clear, when destroyOnFinalize is disabled, the dependent doesn't block.