kubernetes-retired / service-catalog

Consume services in Kubernetes using the Open Service Broker API
https://svc-cat.io
Apache License 2.0
1.05k stars 387 forks source link

Migration restore job is executed regardless of backup being triggered, creating controller downtime #2889

Closed gberche-orange closed 2 years ago

gberche-orange commented 3 years ago

Bug Report

What happened:

Each time a helm upgrade command is run, the job migration from service catalog 0.2.0 to 0.3.0 is run, creating controller downtime as the following trace of the migration job output shows

│ I0520 11:21:32.506933       1 hyperkube.go:192] Service Catalog version v0.3.1-dirty (built 2020-11-05T00:14:24Z)                                                                                        │
│ I0520 11:21:33.071461       1 migration.go:125] Executing restore action                                                                                                                                 │
│ I0520 11:21:34.079337       1 migration.go:166] Webhook server is ready                                                                                                                                  │
│ I0520 11:21:34.079360       1 scale.go:48] Scaling down the controller                                                                                                                                   │
│ I0520 11:21:35.117659       1 migration.go:186] Applying 0 service brokers                                                                                                                               │
│ I0520 11:21:35.117677       1 migration.go:205] Applying 0 cluster service brokers                                                                                                                       │
│ I0520 11:21:35.117682       1 migration.go:223] Applying 0 service classes                                                                                                                               │
│ I0520 11:21:35.117685       1 migration.go:239] Applying 0 cluster service classes                                                                                                                       │
│ I0520 11:21:35.117689       1 migration.go:257] Applying 0 service plans                                                                                                                                 │
│ I0520 11:21:35.117693       1 migration.go:273] Applying 0 cluster service plans                                                                                                                         │
│ I0520 11:21:35.117698       1 migration.go:290] Applying 0 service instances                                                                                                                             │
│ I0520 11:21:35.117704       1 migration.go:331] Applying 0 service bindings                                                                                                                              │
│ I0520 11:21:35.117709       1 migration.go:670] Removing owner referneces from secrets                                                                                                                   │
│ I0520 11:21:35.122541       1 migration.go:690] ...done                                                                                                                                                  │
│ I0520 11:21:35.122573       1 scale.go:54] Scaling up the controller                                                                                                                                     │
│ I0520 11:21:59.141915       1 volume.go:30] Deleting PersistentVolumeClaim         

What you expected to happen:

The migration job should only trigger if the current installation is still running a 0.2.x version.

The helm builtin values do not seem to provide the version of the current release when Release.IsUpgrade=true

https://helm.sh/docs/topics/charts/

Release.Name: The name of the release (not the chart) Release.Namespace: The namespace the chart was released to. Release.Service: The service that conducted the release. Release.IsUpgrade: This is set to true if the current operation is an upgrade or rollback. Release.IsInstall: This is set to true if the current operation is an install. Chart: The contents of the Chart.yaml. Thus, the chart version is obtainable as Chart.Version and the maintainers are in Chart.Maintainers.

The restore job should therefore check whether it needs to run such as

Alternatively, a helm chart migration opt-in or migration opt-out value should easily enable helm template to disable the pre and post migration jobs in https://github.com/kubernetes-sigs/service-catalog/blob/master/charts/catalog/templates/pre-migration-job.yaml and https://github.com/kubernetes-sigs/service-catalog/blob/master/charts/catalog/templates/migration-job.yaml

How to reproduce it (as minimally and precisely as possible):

Anything else we need to know?:

This relates to #2853

Environment:

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

gberche-orange commented 2 years ago

/remove-lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot commented 2 years ago

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

You can:

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot commented 2 years ago

@k8s-triage-robot: Closing this issue.

In response to [this](https://github.com/kubernetes-sigs/service-catalog/issues/2889#issuecomment-1014268862): >The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs. > >This bot triages issues and PRs according to the following rules: >- After 90d of inactivity, `lifecycle/stale` is applied >- After 30d of inactivity since `lifecycle/stale` was applied, `lifecycle/rotten` is applied >- After 30d of inactivity since `lifecycle/rotten` was applied, the issue is closed > >You can: >- Reopen this issue or PR with `/reopen` >- Mark this issue or PR as fresh with `/remove-lifecycle rotten` >- Offer to help out with [Issue Triage][1] > >Please send feedback to sig-contributor-experience at [kubernetes/community](https://github.com/kubernetes/community). > >/close > >[1]: https://www.kubernetes.dev/docs/guide/issue-triage/ Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes/test-infra](https://github.com/kubernetes/test-infra/issues/new?title=Prow%20issue:) repository.