okd-project / okd

The self-managing, auto-upgrading, Kubernetes distribution for everyone
https://okd.io
Apache License 2.0
1.76k stars 297 forks source link

Error with clusterversion object, when updating 4.14 -> 4.15 #1951

Closed falc83 closed 3 months ago

falc83 commented 4 months ago

Describe the bug

Update (restricted network/offline) 4.14->4.15.0-0.okd-2024-03-10-010116 was launched. The update reached the control plane stage. After the master restarted, he had the status notshedulable (for 2 hours and more) and when I tried connecting via ssh I saw an error: "System is booting up. Unprivileged users are not permitted to log in yet. Please come back later.

After unsuccessful attempts to continue the update, there was an attempt to update to a newer version. But by mistake the old version 4.14.0-0.okd-2024-01-26-175629 was set. A rollback to the previous version occurred. All operators, masters, workers rolled back to version 4.14.0-0.okd-2024-01-26-175629. But a problem occurred in clusterversion/version. As I can understand, information that the update is complete cannot be written there. When trying to make any changes:

oc patch clusterversion version --type merge -p '{"spec":{"capabilities":{"baselineCapabilitySet":"v4.11"}}}'
The ClusterVersion "version" is invalid:
* status.capabilities.enabledCapabilities[2]: Unsupported value: "CloudCredential": supported values: "openshift-samples", "baremetal", "marketplace", "Console", "Insights", "Storage", "CSISnapshot", "NodeTuning", "MachineAPI", "Build", "DeploymentConfig", "ImageRegistry"
* status.capabilities.enabledCapabilities[9]: Unsupported value: "OperatorLifecycleManager": supported values: "openshift-samples", "baremetal", "marketplace", "Console", "Insights", "Storage", "CSISnapshot", "NodeTuning", "MachineAPI", "Build", "DeploymentConfig", "ImageRegistry"
* status.capabilities.knownCapabilities[2]: Unsupported value: "CloudCredential": supported values: "openshift-samples", "baremetal", "marketplace", "Console", "Insights", "Storage", "CSISnapshot", "NodeTuning", "MachineAPI", "Build", "DeploymentConfig", "ImageRegistry"
* status.capabilities.knownCapabilities[9]: Unsupported value: "OperatorLifecycleManager": supported values: "openshift-samples", "baremetal", "marketplace", "Console", "Insights", "Storage", "CSISnapshot", "NodeTuning", "MachineAPI", "Build", "DeploymentConfig", "ImageRegistry"

clusterversion still contains compatibility from version 4.15, which were not in version 4.14: OperatorLifecycleManager, CloudCredential It is not possible to delete them from there.

Version

4.14.0-0.okd-2024-01-26-175629 4.15.0-0.okd-2024-03-10-010116

How to reproduce

  1. oc adm upgrade --force (restricted network) from 4.14 to 4.15 and get several errors oc adm upgrade --allow-explicit-upgrade --to-image ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:4.15.0-0.okd-2024-03-10-010116 --force
  2. oc adm upgrade --force (restricted network) to 4.14 oc adm upgrade --allow-explicit-upgrade --to-image ${LOCAL_REGISTRY}/${LOCAL_REPOSITORY}:4.14.0-0.okd-2024-01-26-175629 --force
JaimeMagiera commented 3 months ago

Hi,

We are not working on FCOS builds of OKD any more. Please see these documents...

https://okd.io/blog/2024/06/01/okd-future-statement https://okd.io/blog/2024/07/30/okd-pre-release-testing

In terms of clusters that are older, you may be able to get help from community members. I'll convert this to a discussion to facilitate that.

Many thanks,

Jaime