openshift / oc

The OpenShift Command Line, part of OKD
https://www.openshift.org
Apache License 2.0
185 stars 373 forks source link

OCPBUGS-35994: pkg/cli/admin/upgrade/rollback: Drop this command #1806

Closed wking closed 4 hours ago

wking commented 2 weeks ago

Current CI/QE capacity doesn't have space to support rollbacks, which require basically the same level of effort as supporting roll-forward updates, but which would never be used by as many clusters as the number of clusters that use roll-forward updates. In addition, paused MachineConfigPools and HyperShift's NodePools and similar allow for decoupling compute updates from control-plane updates. And control-plane updates have PodDisruptionBudgets and such to wedge gracefully if an issue comes up during updates that destabilizes the control plane. And if a control-plane update goes smoothly, but its behavior changes destabilize cluster workloads, ClusterVersion overrides and similar give an emergency safety valve to patch things up while admins wait for new releases that address the newly-discovered issues, while conditional updates allow the risk to be declared so additional clusters are warned before attempting an exposed update.

With the moderate operational benefit weighted against the substantial cost of CI/QE test coverage, this commit reverts the subcommand which I'd initially added in 73074c32ba (#1642) and gated in 9b2842a0c9 (#1764).

openshift-ci-robot commented 2 weeks ago

@wking: This pull request references Jira Issue OCPBUGS-35994, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @jiajliu

The bug has been updated to refer to the pull request using the external bug tracker.

In response to [this](https://github.com/openshift/oc/pull/1806): >Current CI/QE capacity doesn't have space to support rollbacks, which require basically the same level of effort as supporting roll-forward updates, but which would never be used by as many clusters as the number of clusters that use roll-forward updates. In addition, paused MachineConfigPools and HyperShift's NodePools and similar allow for decoupling compute updates from control-plane updates. And control-plane updates have PodDisruptionBudgets and such to wedge gracefully if an issue comes up during updates that destabilizes the control plane. And if a control-plane update goes smoothly, but its behavior changes destabilize cluster workloads, ClusterVersion overrides and similar give an emergency safety valve to patch things up while admins wait for new releases that address the newly-discovered issues, while [conditional updates][1] allow the risk to be declared so additional clusters are warned before attempting an exposed update. > >With the moderate operational benefit weighted against the substantial cost of CI/QE test coverage, this commit reverts the subcommand which I'd initially added in 73074c32ba (#1642) and gated in 9b2842a0c9 (#1764). > >[1]: https://github.com/openshift/enhancements/blob/0977611b030fcb667b22cdf36a86e3852ee54e84/enhancements/update/targeted-update-edge-blocking.md Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Foc). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
petr-muller commented 1 week ago

/cc

evakhoni commented 1 week ago

/cc

wking commented 5 days ago

Service Delivery isn't very chatty about this, and re-reverts aren't hard if it comes to that.

/hold cancel

evakhoni commented 1 day ago

pre-merge verified /label qe-approved

openshift-ci-robot commented 1 day ago

@wking: This pull request references Jira Issue OCPBUGS-35994, which is valid.

3 validation(s) were run on this bug * bug is open, matching expected state (open) * bug target version (4.17.0) matches configured target version for branch (4.17.0) * bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact: /cc @evakhoni

In response to [this](https://github.com/openshift/oc/pull/1806): >Current CI/QE capacity doesn't have space to support rollbacks, which require basically the same level of effort as supporting roll-forward updates, but which would never be used by as many clusters as the number of clusters that use roll-forward updates. In addition, paused MachineConfigPools and HyperShift's NodePools and similar allow for decoupling compute updates from control-plane updates. And control-plane updates have PodDisruptionBudgets and such to wedge gracefully if an issue comes up during updates that destabilizes the control plane. And if a control-plane update goes smoothly, but its behavior changes destabilize cluster workloads, ClusterVersion overrides and similar give an emergency safety valve to patch things up while admins wait for new releases that address the newly-discovered issues, while [conditional updates][1] allow the risk to be declared so additional clusters are warned before attempting an exposed update. > >With the moderate operational benefit weighted against the substantial cost of CI/QE test coverage, this commit reverts the subcommand which I'd initially added in 73074c32ba (#1642) and gated in 9b2842a0c9 (#1764). > >[1]: https://github.com/openshift/enhancements/blob/0977611b030fcb667b22cdf36a86e3852ee54e84/enhancements/update/targeted-update-edge-blocking.md Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Foc). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.
petr-muller commented 7 hours ago

/lgtm

openshift-ci[bot] commented 7 hours ago

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: petr-muller, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files: - ~~[pkg/cli/admin/upgrade/OWNERS](https://github.com/openshift/oc/blob/master/pkg/cli/admin/upgrade/OWNERS)~~ [petr-muller,wking] Approvers can indicate their approval by writing `/approve` in a comment Approvers can cancel approval by writing `/approve cancel` in a comment
openshift-ci[bot] commented 4 hours ago

@wking: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available [here](https://git.k8s.io/community/contributors/guide/pull-requests.md). If you have questions or suggestions related to my behavior, please file an issue against the [kubernetes-sigs/prow](https://github.com/kubernetes-sigs/prow/issues/new?title=Prow%20issue:) repository. I understand the commands that are listed [here](https://go.k8s.io/bot-commands).
openshift-ci-robot commented 4 hours ago

@wking: Jira Issue OCPBUGS-35994: All pull requests linked via external trackers have merged:

Jira Issue OCPBUGS-35994 has been moved to the MODIFIED state.

In response to [this](https://github.com/openshift/oc/pull/1806): >Current CI/QE capacity doesn't have space to support rollbacks, which require basically the same level of effort as supporting roll-forward updates, but which would never be used by as many clusters as the number of clusters that use roll-forward updates. In addition, paused MachineConfigPools and HyperShift's NodePools and similar allow for decoupling compute updates from control-plane updates. And control-plane updates have PodDisruptionBudgets and such to wedge gracefully if an issue comes up during updates that destabilizes the control plane. And if a control-plane update goes smoothly, but its behavior changes destabilize cluster workloads, ClusterVersion overrides and similar give an emergency safety valve to patch things up while admins wait for new releases that address the newly-discovered issues, while [conditional updates][1] allow the risk to be declared so additional clusters are warned before attempting an exposed update. > >With the moderate operational benefit weighted against the substantial cost of CI/QE test coverage, this commit reverts the subcommand which I'd initially added in 73074c32ba (#1642) and gated in 9b2842a0c9 (#1764). > >[1]: https://github.com/openshift/enhancements/blob/0977611b030fcb667b22cdf36a86e3852ee54e84/enhancements/update/targeted-update-edge-blocking.md Instructions for interacting with me using PR comments are available [here](https://prow.ci.openshift.org/command-help?repo=openshift%2Foc). If you have questions or suggestions related to my behavior, please file an issue against the [openshift-eng/jira-lifecycle-plugin](https://github.com/openshift-eng/jira-lifecycle-plugin/issues/new) repository.