gberche-orange commented 3 years ago

Bug Report

What happened:

Each time a helm upgrade command is run, the job migration from service catalog 0.2.0 to 0.3.0 is run, creating controller downtime as the following trace of the migration job output shows

│ I0520 11:21:32.506933       1 hyperkube.go:192] Service Catalog version v0.3.1-dirty (built 2020-11-05T00:14:24Z)                                                                                        │
│ I0520 11:21:33.071461       1 migration.go:125] Executing restore action                                                                                                                                 │
│ I0520 11:21:34.079337       1 migration.go:166] Webhook server is ready                                                                                                                                  │
│ I0520 11:21:34.079360       1 scale.go:48] Scaling down the controller                                                                                                                                   │
│ I0520 11:21:35.117659       1 migration.go:186] Applying 0 service brokers                                                                                                                               │
│ I0520 11:21:35.117677       1 migration.go:205] Applying 0 cluster service brokers                                                                                                                       │
│ I0520 11:21:35.117682       1 migration.go:223] Applying 0 service classes                                                                                                                               │
│ I0520 11:21:35.117685       1 migration.go:239] Applying 0 cluster service classes                                                                                                                       │
│ I0520 11:21:35.117689       1 migration.go:257] Applying 0 service plans                                                                                                                                 │
│ I0520 11:21:35.117693       1 migration.go:273] Applying 0 cluster service plans                                                                                                                         │
│ I0520 11:21:35.117698       1 migration.go:290] Applying 0 service instances                                                                                                                             │
│ I0520 11:21:35.117704       1 migration.go:331] Applying 0 service bindings                                                                                                                              │
│ I0520 11:21:35.117709       1 migration.go:670] Removing owner referneces from secrets                                                                                                                   │
│ I0520 11:21:35.122541       1 migration.go:690] ...done                                                                                                                                                  │
│ I0520 11:21:35.122573       1 scale.go:54] Scaling up the controller                                                                                                                                     │
│ I0520 11:21:59.141915       1 volume.go:30] Deleting PersistentVolumeClaim

What you expected to happen:

The migration job should only trigger if the current installation is still running a 0.2.x version.

The helm builtin values do not seem to provide the version of the current release when Release.IsUpgrade=true

https://helm.sh/docs/topics/charts/

Release.Name: The name of the release (not the chart) Release.Namespace: The namespace the chart was released to. Release.Service: The service that conducted the release. Release.IsUpgrade: This is set to true if the current operation is an upgrade or rollback. Release.IsInstall: This is set to true if the current operation is an install. Chart: The contents of the Chart.yaml. Thus, the chart version is obtainable as Chart.Version and the maintainers are in Chart.Maintainers.

The restore job should therefore check whether it needs to run such as

check whether the backup job had previously started such as:
- lookup a config map
check the version of the service catalog code base in an extra annotation on the controller manager deployment resource
- See related recommended helm labels at https://helm.sh/docs/chart_best_practices/labels/

Alternatively, a helm chart migration opt-in or migration opt-out value should easily enable helm template to disable the pre and post migration jobs in https://github.com/kubernetes-sigs/service-catalog/blob/master/charts/catalog/templates/pre-migration-job.yaml and https://github.com/kubernetes-sigs/service-catalog/blob/master/charts/catalog/templates/migration-job.yaml

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

This relates to #2853

Environment:

Kubernetes version (use kubectl version):
service-catalog version: Service Catalog version v0.3.1-dirty (built 2020-11-05T00:14:24Z)
Install tools:
- Did you use helm? Yes
- What were the helm arguments? helm upgrade

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

gberche-orange commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/service-catalog/issues/2889#issuecomment-1014268862): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.

kubernetes-retired / service-catalog

Migration restore job is executed regardless of backup being triggered, creating controller downtime #2889

Bug Report