helm.sh/v3:Release, unclear how to recover from "another operation (install/upgrade/rollback) is in progress"

shousper commented 2 years ago

What happened?

After deploying the Release initially everything was fine, but when the chart failed to deploy because of a configuration issue it reported:

 ~  kubernetes:helm.sh/v3:Release airflow **updating failed** [diff: ~values]; error: another operation (install/upgrade/rollback) is in progress

I've been unable to figure out how to recover from this, and was forced to pulumi destroy the stack & re-create it. A pulumi cancel didn't seem to have any affect.

Steps to reproduce

Deploy a stack to a Kubernetes cluster like this: https://gist.github.com/shousper/3d0a1cb83ad235276f59f8a95d4a4bba
On first deploy, it'll hang because it is (purposefully) misconfigured. Pulumi will wait forever, you'll see a Job called fail-airflow-run-airflow-migrations that will keep failing.
Cancel the deployment.
Run pulumi cancel if you like, it won't matter.
Run pulumi up the preview will "look okay" but it'll fail when to attempt to apply the changes.

Expected Behavior

Should restore regular function upon running pulumi cancel and/or provide better instruction as to what action is required to resolve the stuck "operation in progress".

Actual Behavior

Stack becomes inoperable and must be destroyed.

Versions used

CLI
Version      3.33.2
Go Version   go1.17.10
Go Compiler  gc

Plugins
NAME        VERSION
aws         5.6.0
docker      3.2.0
kubernetes  3.19.3
nodejs      unknown
random      4.8.0

Host
OS       darwin
Version  12.4
Arch     x86_64

This project is written in nodejs (/Users/cmcgregor/.asdf/shims/node v14.19.2)

Current Stack: staging-data-airflow-dags

TYPE                                                URN
pulumi:pulumi:Stack                                 urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:Stack::data-airflow-dags-staging-data-airflow-dags
pulumi:providers:aws                                urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:aws::default_5_6_0
pulumi:providers:pulumi                             urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:pulumi::default
aws:cloudwatch/logGroup:LogGroup                    urn:pulumi:staging-data-airflow-dags::data-airflow-dags::aws:cloudwatch/logGroup:LogGroup::workers
pulumi:providers:kubernetes                         urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:kubernetes::default_3_19_3
pulumi:providers:random                             urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:providers:random::default_4_8_0
kubernetes:core/v1:Namespace                        urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace::airflow
random:index/randomPassword:RandomPassword          urn:pulumi:staging-data-airflow-dags::data-airflow-dags::random:index/randomPassword:RandomPassword::default-user-password
random:index/randomString:RandomString              urn:pulumi:staging-data-airflow-dags::data-airflow-dags::random:index/randomString:RandomString::webserver-secret-key
aws:ecr/repository:Repository                       urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$aws:ecr/repository:Repository::airflow
kubernetes:monitoring.coreos.com/v1:ServiceMonitor  urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:monitoring.coreos.com/v1:ServiceMonitor::airflow
kubernetes:core/v1:Secret                           urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::dags-ssh-key
kubernetes:core/v1:Secret                           urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::oauth
awsx:ecr:Repository                                 urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$awsx:ecr:Repository::airflow
kubernetes:core/v1:Secret                           urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::webserver
pulumi:pulumi:StackReference                        urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:StackReference::shared-database
aws:ecr/lifecyclePolicy:LifecyclePolicy             urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$awsx:ecr:Repository$aws:ecr/lifecyclePolicy:LifecyclePolicy::airflow
pulumi:pulumi:StackReference                        urn:pulumi:staging-data-airflow-dags::data-airflow-dags::pulumi:pulumi:StackReference::infra
kubernetes:core/v1:Secret                           urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::data-metadata-connection
kubernetes:core/v1:Secret                           urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:Secret::connections
aws:iam/role:Role                                   urn:pulumi:staging-data-airflow-dags::data-airflow-dags::aws:iam/role:Role::worker
kubernetes:core/v1:ServiceAccount                   urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:core/v1:Namespace$kubernetes:core/v1:ServiceAccount::airflow-worker
kubernetes:helm.sh/v3:Release                       urn:pulumi:staging-data-airflow-dags::data-airflow-dags::kubernetes:helm.sh/v3:Release::airflow

Found no pending operations associated with staging-data-airflow-dags

Backend
Name           maelstrom.local
URL            s3://<redacted>
User           cmcgregor
Organizations

NAME                              VERSION
import-sort-parser-typescript     6.0.0
import-sort-style-module-scoped   1.0.3
@pulumi/kubernetes                3.19.3
@pulumi/pulumi                    3.34.1
@types/js-yaml                    4.0.5
@typescript-eslint/eslint-plugin  5.28.0
@typescript-eslint/parser         5.28.0
eslint-plugin-unused-imports      2.0.0
eslint-plugin-prettier            4.0.0
import-sort                       6.0.0
typescript                        4.7.3
@types/node                       14.18.21
eslint                            8.17.0
eslint-config-prettier            8.5.0
js-yaml                           4.1.0
prettier-plugin-import-sort       0.0.7
@pulumi/aws                       5.6.0
@pulumi/random                    4.8.0
prettier                          2.7.0

Pulumi locates its logs in /var/folders/p0/zkdhxd596q31sxqykz4shh6c0000gn/T/ by default

Additional context

Originally raised in Pulumi's Community slack: https://pulumi-community.slack.com/archives/CRFURDVQB/p1655436629559579

Contributing

Vote on this issue by adding a 👍 reaction. To contribute a fix for this issue, leave a comment (and link to your pull request, if you've opened one already).

lblackstone commented 2 years ago

@viveklak Any idea if there's a workaround for this?

viveklak commented 2 years ago

@shousper thanks for filing the issue. By default Helm release installs in wait mode - i.e. it waits for the underlying chart resources to be installed. You can skip this by setting the https://www.pulumi.com/registry/packages/kubernetes/api-docs/helm/v3/release/#skipawait_nodejs flag. It seems there is a class of issues with Helm itself where if a blocking update/install is interrupted, it might leave the release in an inconsistent state. You may have to consider a workaround like described here to recover: https://github.com/helm/helm/issues/4558#issuecomment-648352068

shousper commented 2 years ago

@viveklak Thanks! So just using the helm CLI to perform the rollback? No worries, I'll give that a go and report back if there are any problems ✌🏻

shousper commented 2 years ago

So using the helm CLI to uninstall/rollback a release appears to unblock pulumi from taking further action 🎉 However, pulumi still reports this warning upon subsequent update, despite everything being okay:

    warning: Attempting to deploy or update resources with 1 pending operations from previous deployment.
      * urn:pulumi:staging-kafka-cruise-control::kafka-cruise-control::kubernetes:helm.sh/v3:Release::cruise-control-oauth-proxy, interrupted while creating

I ran a pulumi stack export and found there were still some pending_operations in the state. So I ran pulumi cancel against the stack, and exported it again, but the pending operations were still there. Perhaps just a gap in the way pulumi cancel is handled for some resources?

drew-altana commented 2 years ago

I'm seeing this frequently when the host/pulumi process dies. The way I've been resetting it is to just delete the helm release secret via k9s. I think pulumi should add an option to rollback existing installs that may be stuck prior to installing new helm releases.

pulumi / pulumi-kubernetes