Open shousper opened 2 years ago
@viveklak Any idea if there's a workaround for this?
@shousper thanks for filing the issue. By default Helm release installs in wait
mode - i.e. it waits for the underlying chart resources to be installed. You can skip this by setting the https://www.pulumi.com/registry/packages/kubernetes/api-docs/helm/v3/release/#skipawait_nodejs flag. It seems there is a class of issues with Helm itself where if a blocking update/install is interrupted, it might leave the release in an inconsistent state. You may have to consider a workaround like described here to recover: https://github.com/helm/helm/issues/4558#issuecomment-648352068
@viveklak Thanks! So just using the helm CLI to perform the rollback? No worries, I'll give that a go and report back if there are any problems ✌🏻
So using the helm
CLI to uninstall/rollback a release appears to unblock pulumi from taking further action 🎉 However, pulumi still reports this warning upon subsequent update, despite everything being okay:
warning: Attempting to deploy or update resources with 1 pending operations from previous deployment.
* urn:pulumi:staging-kafka-cruise-control::kafka-cruise-control::kubernetes:helm.sh/v3:Release::cruise-control-oauth-proxy, interrupted while creating
I ran a pulumi stack export
and found there were still some pending_operations
in the state. So I ran pulumi cancel
against the stack, and exported it again, but the pending operations were still there. Perhaps just a gap in the way pulumi cancel
is handled for some resources?
I'm seeing this frequently when the host/pulumi process dies. The way I've been resetting it is to just delete the helm release secret via k9s. I think pulumi should add an option to rollback existing installs that may be stuck prior to installing new helm releases.
What happened?
After deploying the
Release
initially everything was fine, but when the chart failed to deploy because of a configuration issue it reported:I've been unable to figure out how to recover from this, and was forced to
pulumi destroy
the stack & re-create it. Apulumi cancel
didn't seem to have any affect.Steps to reproduce
Job
calledfail-airflow-run-airflow-migrations
that will keep failing.pulumi cancel
if you like, it won't matter.pulumi up
the preview will "look okay" but it'll fail when to attempt to apply the changes.Expected Behavior
Should restore regular function upon running
pulumi cancel
and/or provide better instruction as to what action is required to resolve the stuck "operation in progress".Actual Behavior
Stack becomes inoperable and must be destroyed.
Versions used
Additional context
Originally raised in Pulumi's Community slack: https://pulumi-community.slack.com/archives/CRFURDVQB/p1655436629559579
Contributing
Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).