rancher / elemental-operator

The Elemental operator is responsible for managing the OS versions and maintaining a machine inventory to assist with edge or baremetal installations.
Apache License 2.0
43 stars 17 forks source link

Cannot reinstall/upgrade elemental-operator after a Rancher Manager migration #881

Open ldevulder opened 3 weeks ago

ldevulder commented 3 weeks ago

I deployed Rancher Manager (Stable, v2.9.3) with latest elemental-operator (Dev version) and backup-restore operator and I encountered some issues.

First one is that metadata Kind is not saved, I opened a PR for this.

Second one is that after migrating/reinstalling the whole Rancher Manager from the backup I'm not able to reinstall/upgrade elemental-operator anymore with helm (CRDs chart is OK). I have this error:

$ helm upgrade --install --devel elemental-operator \
      oci://registry.opensuse.org/isv/rancher/elemental/dev/charts/rancher/elemental-operator-chart \
      --namespace cattle-elemental-system \
      --create-namespace --wait --wait-for-jobs
Release "elemental-operator" does not exist. Installing it now.
Pulled: registry.opensuse.org/isv/rancher/elemental/dev/charts/rancher/elemental-operator-chart:1.8.0-dev-378.2
Digest: sha256:3930a3682c33f137b63633d5a53ba1639f6dd7da5093b426b1a848a9eec93f4a
W1031 18:02:56.474101   18035 warnings.go:70] unknown field "roleRef.namespace"
Error: cannot patch "sl-micro-6.0-baremetal" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.0-baremetal" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sl-micro-6.0-base" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.0-base" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sl-micro-6.0-kvm" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.0-kvm" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sle-micro-6.0-rt" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sle-micro-6.0-rt" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "unstable-testing-channel" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "unstable-testing-channel" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update
Error while deploying elemental-operator!

Please note that after the migration/reinstallation elemental-operator seems to work as expected, only the upgrade/reinstallation with helm is not working.

Maybe something is still missing in the backup of elemental-operator, but I'm not able to figure out what...

davidcassany commented 3 weeks ago

Gonna have a closer look next week. To me this smells as an issue related to the dynamic channels provided by helm šŸ¤” My bet is that this logic might not be fully compatible with the current backup and restore process: https://github.com/rancher/elemental-operator/blob/f300a43fd9777a187ae28d75a8596a847f6b678c/.obs/chartfile/elemental-operator-helm/templates/channels.yaml#L33-L46

ldevulder commented 3 weeks ago

I saw that the OS channels and the OS are created in namespace fleet-default and Metadata is in cattle-element-system, could this have an impact?

davidcassany commented 2 weeks ago

Please note that after the migration/reinstallation elemental-operator seems to work as expected, only the upgrade/reinstallation with helm is not working.

@ldevulder could you relate give me some further details steps of the use case that is functional vs the one that is not? I assume installing clean rancher and elemental, then making a backup and then migrating it to another clean install seams to work. But it doesn't work if the operator was upgraded before making the backup, is that correct?

davidcassany commented 2 weeks ago

I am also wondering if this will be solved by moving dev to 6.1 channels, probably this is also caused by changing some channel data in dev for 6.0 channels, which shouldn't be there in any case.

ldevulder commented 2 weeks ago

@davidcassany I tested an upgrade from Stable operator to Dev one after a restore and it worked:

Release "elemental-operator" does not exist. Installing it now.
Pulled: registry.opensuse.org/isv/rancher/elemental/dev/charts/rancher/elemental-operator-chart:1.8.0-dev-380.1
Digest: sha256:259d888269e53758c1d63360904f51e22e8b7dee92318dd619e3094a6abd6590
W1112 12:45:28.407818   19392 warnings.go:70] unknown field "roleRef.namespace"
NAME: elemental-operator
LAST DEPLOYED: Tue Nov 12 12:45:27 2024
NAMESPACE: cattle-elemental-system
STATUS: deployed
REVISION: 1
TEST SUITE: None

The only remaining issue is this warning:

W1112 12:45:28.407818   19392 warnings.go:70] unknown field "roleRef.namespace"

I also still have the issue when I try to reinstall the Stable version of the operator, but sounds logical as nothing has been changed on that side.

ldevulder commented 2 weeks ago

@davidcassany I just tested to reinstall the Dev version of the operator after a restore with a previously Dev operator installed and I have the same error:

Release "elemental-operator" does not exist. Installing it now.
Pulled: registry.opensuse.org/isv/rancher/elemental/dev/charts/rancher/elemental-operator-chart:1.8.0-dev-380.1
Digest: sha256:259d888269e53758c1d63360904f51e22e8b7dee92318dd619e3094a6abd6590
W1113 15:58:15.672527   32592 warnings.go:70] unknown field "roleRef.namespace"
Error: cannot patch "sl-micro-6.1-baremetal" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.1-baremetal" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sl-micro-6.1-base" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.1-base" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sl-micro-6.1-kvm" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sl-micro-6.1-kvm" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "sle-micro-6.1-rt" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "sle-micro-6.1-rt" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update && cannot patch "unstable-testing-channel" with kind ManagedOSVersionChannel: managedosversionchannels.elemental.cattle.io "unstable-testing-channel" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update

So an upgrade from Stable to Dev works but not a re-installation.