opendatahub-io / opendatahub-operator

Open Data Hub operator to manage ODH component integrations
https://opendatahub.io
Apache License 2.0
59 stars 131 forks source link

Upgrade from released odh 2.4 to latest main fails #788

Closed kornys closed 9 months ago

kornys commented 9 months ago

Describe the bug DSC during upgrade from odh 2.4 to latest main fails and all conditions in dsc are in failed state.

To Reproduce Steps to reproduce the behavior:

  1. install odh 2.4
  2. Create DSC
  3. Upgrade to latest
  4. check dsc status

Operator Log:

* Deployment.apps "odh-dashboard" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"odh-dashboard", "app.kubernetes.io/part-of":"dashboard", "app.opendatahub.io/dashboard":"true", "deployment":"odh-dashboard"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "odh-notebook-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"odh-notebook-controller", "app.kubernetes.io/part-of":"workbenches", "app.opendatahub.io/workbenches":"true", "component.opendatahub.io/name":"odh-notebook-controller", "kustomize.component":"odh-notebook-controller", "opendatahub.io/component":"true"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "modelmesh-controller" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"model-mesh", "app.kubernetes.io/part-of":"model-mesh", "app.opendatahub.io/model-mesh":"true", "control-plane":"modelmesh-controller"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "data-science-pipelines-operator-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"data-science-pipelines-operator", "app.kubernetes.io/part-of":"data-science-pipelines-operator", "app.opendatahub.io/data-science-pipelines-operator":"true"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "kserve-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/part-of":"kserve", "app.opendatahub.io/kserve":"true", "control-plane":"kserve-controller-manager", "controller-tools.k8s.io":"1.0"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "codeflare-operator-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"codeflare-operator", "app.kubernetes.io/part-of":"codeflare", "app.opendatahub.io/codeflare":"true"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "kuberay-operator" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/component":"kuberay-operator", "app.kubernetes.io/name":"kuberay", "app.kubernetes.io/part-of":"ray", "app.opendatahub.io/ray":"true"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable
* Deployment.apps "trustyai-service-operator-controller-manager" is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/part-of":"trustyai", "app.opendatahub.io/trustyai":"true", "control-plane":"controller-manager"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable

DSC:

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  creationTimestamp: "2023-12-13T10:37:20Z"
  finalizers:
  - datasciencecluster.opendatahub.io/finalizer
  generation: 2
  name: test-notebooks-upgrade
  resourceVersion: "27443427"
  uid: f41b849d-f1bf-4a59-b7ed-8668726a645b
spec:
  components:
    codeflare:
      devFlags: {}
      managementState: Managed
    dashboard:
      devFlags: {}
      managementState: Managed
    datasciencepipelines:
      devFlags: {}
      managementState: Managed
    kserve:
      devFlags: {}
      managementState: Removed
    modelmeshserving:
      devFlags: {}
      managementState: Removed
    ray:
      devFlags: {}
    trustyai:
      devFlags: {}
    workbenches:
      devFlags: {}
      managementState: Managed
status:
  conditions:
  - lastHeartbeatTime: "2023-12-13T10:40:49Z"
    lastTransitionTime: "2023-12-13T10:37:52Z"
    message: "DataScienceCluster resource reconciled with component errors: 4 errors
      occurred:\n\t* Deployment.apps \"odh-dashboard\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-dashboard\",
      \"app.kubernetes.io/part-of\":\"dashboard\", \"app.opendatahub.io/dashboard\":\"true\",
      \"deployment\":\"odh-dashboard\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"odh-notebook-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-notebook-controller\",
      \"app.kubernetes.io/part-of\":\"workbenches\", \"app.opendatahub.io/workbenches\":\"true\",
      \"component.opendatahub.io/name\":\"odh-notebook-controller\", \"kustomize.component\":\"odh-notebook-controller\",
      \"opendatahub.io/component\":\"true\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"data-science-pipelines-operator-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"data-science-pipelines-operator\",
      \"app.kubernetes.io/part-of\":\"data-science-pipelines-operator\", \"app.opendatahub.io/data-science-pipelines-operator\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\t*
      Deployment.apps \"codeflare-operator-manager\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"codeflare-operator\",
      \"app.kubernetes.io/part-of\":\"codeflare\", \"app.opendatahub.io/codeflare\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\n"
    reason: ReconcileCompletedWithComponentErrors
    status: "True"
    type: ReconcileComplete
  - lastHeartbeatTime: "2023-12-13T10:40:49Z"
    lastTransitionTime: "2023-12-13T10:37:52Z"
    message: "DataScienceCluster resource reconciled with component errors: 4 errors
      occurred:\n\t* Deployment.apps \"odh-dashboard\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-dashboard\",
      \"app.kubernetes.io/part-of\":\"dashboard\", \"app.opendatahub.io/dashboard\":\"true\",
      \"deployment\":\"odh-dashboard\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"odh-notebook-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-notebook-controller\",
      \"app.kubernetes.io/part-of\":\"workbenches\", \"app.opendatahub.io/workbenches\":\"true\",
      \"component.opendatahub.io/name\":\"odh-notebook-controller\", \"kustomize.component\":\"odh-notebook-controller\",
      \"opendatahub.io/component\":\"true\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"data-science-pipelines-operator-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"data-science-pipelines-operator\",
      \"app.kubernetes.io/part-of\":\"data-science-pipelines-operator\", \"app.opendatahub.io/data-science-pipelines-operator\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\t*
      Deployment.apps \"codeflare-operator-manager\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"codeflare-operator\",
      \"app.kubernetes.io/part-of\":\"codeflare\", \"app.opendatahub.io/codeflare\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\n"
    reason: ReconcileCompletedWithComponentErrors
    status: "True"
    type: Available
  - lastHeartbeatTime: "2023-12-13T10:40:49Z"
    lastTransitionTime: "2023-12-13T10:37:52Z"
    message: "DataScienceCluster resource reconciled with component errors: 4 errors
      occurred:\n\t* Deployment.apps \"odh-dashboard\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-dashboard\",
      \"app.kubernetes.io/part-of\":\"dashboard\", \"app.opendatahub.io/dashboard\":\"true\",
      \"deployment\":\"odh-dashboard\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"odh-notebook-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-notebook-controller\",
      \"app.kubernetes.io/part-of\":\"workbenches\", \"app.opendatahub.io/workbenches\":\"true\",
      \"component.opendatahub.io/name\":\"odh-notebook-controller\", \"kustomize.component\":\"odh-notebook-controller\",
      \"opendatahub.io/component\":\"true\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"data-science-pipelines-operator-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"data-science-pipelines-operator\",
      \"app.kubernetes.io/part-of\":\"data-science-pipelines-operator\", \"app.opendatahub.io/data-science-pipelines-operator\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\t*
      Deployment.apps \"codeflare-operator-manager\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"codeflare-operator\",
      \"app.kubernetes.io/part-of\":\"codeflare\", \"app.opendatahub.io/codeflare\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\n"
    reason: ReconcileCompletedWithComponentErrors
    status: "False"
    type: Progressing
  - lastHeartbeatTime: "2023-12-13T10:40:49Z"
    lastTransitionTime: "2023-12-13T10:37:34Z"
    message: "DataScienceCluster resource reconciled with component errors: 4 errors
      occurred:\n\t* Deployment.apps \"odh-dashboard\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-dashboard\",
      \"app.kubernetes.io/part-of\":\"dashboard\", \"app.opendatahub.io/dashboard\":\"true\",
      \"deployment\":\"odh-dashboard\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"odh-notebook-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-notebook-controller\",
      \"app.kubernetes.io/part-of\":\"workbenches\", \"app.opendatahub.io/workbenches\":\"true\",
      \"component.opendatahub.io/name\":\"odh-notebook-controller\", \"kustomize.component\":\"odh-notebook-controller\",
      \"opendatahub.io/component\":\"true\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"data-science-pipelines-operator-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"data-science-pipelines-operator\",
      \"app.kubernetes.io/part-of\":\"data-science-pipelines-operator\", \"app.opendatahub.io/data-science-pipelines-operator\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\t*
      Deployment.apps \"codeflare-operator-manager\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"codeflare-operator\",
      \"app.kubernetes.io/part-of\":\"codeflare\", \"app.opendatahub.io/codeflare\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\n"
    reason: ReconcileCompletedWithComponentErrors
    status: "False"
    type: Degraded
  - lastHeartbeatTime: "2023-12-13T10:40:49Z"
    lastTransitionTime: "2023-12-13T10:37:52Z"
    message: "DataScienceCluster resource reconciled with component errors: 4 errors
      occurred:\n\t* Deployment.apps \"odh-dashboard\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-dashboard\",
      \"app.kubernetes.io/part-of\":\"dashboard\", \"app.opendatahub.io/dashboard\":\"true\",
      \"deployment\":\"odh-dashboard\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"odh-notebook-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app\":\"odh-notebook-controller\",
      \"app.kubernetes.io/part-of\":\"workbenches\", \"app.opendatahub.io/workbenches\":\"true\",
      \"component.opendatahub.io/name\":\"odh-notebook-controller\", \"kustomize.component\":\"odh-notebook-controller\",
      \"opendatahub.io/component\":\"true\"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable\n\t* Deployment.apps \"data-science-pipelines-operator-controller-manager\"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"data-science-pipelines-operator\",
      \"app.kubernetes.io/part-of\":\"data-science-pipelines-operator\", \"app.opendatahub.io/data-science-pipelines-operator\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\t*
      Deployment.apps \"codeflare-operator-manager\" is invalid: spec.selector:
      Invalid value: v1.LabelSelector{MatchLabels:map[string]string{\"app.kubernetes.io/name\":\"codeflare-operator\",
      \"app.kubernetes.io/part-of\":\"codeflare\", \"app.opendatahub.io/codeflare\":\"true\"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable\n\n"
    reason: ReconcileCompletedWithComponentErrors
    status: "True"
    type: Upgradeable
  - lastHeartbeatTime: "2023-12-13T10:40:50Z"
    lastTransitionTime: "2023-12-13T10:40:50Z"
    message: 'Component reconciliation failed: Deployment.apps "odh-dashboard" is
      invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"odh-dashboard",
      "app.kubernetes.io/part-of":"dashboard", "app.opendatahub.io/dashboard":"true",
      "deployment":"odh-dashboard"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable'
    reason: ReconcileFailed
    status: "False"
    type: dashboardReady
  - lastHeartbeatTime: "2023-12-13T10:40:50Z"
    lastTransitionTime: "2023-12-13T10:40:50Z"
    message: 'Component reconciliation failed: Deployment.apps "odh-notebook-controller-manager"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app":"odh-notebook-controller",
      "app.kubernetes.io/part-of":"workbenches", "app.opendatahub.io/workbenches":"true",
      "component.opendatahub.io/name":"odh-notebook-controller", "kustomize.component":"odh-notebook-controller",
      "opendatahub.io/component":"true"}, MatchExpressions:[]v1.LabelSelectorRequirement(nil)}:
      field is immutable'
    reason: ReconcileFailed
    status: "False"
    type: workbenchesReady
  - lastHeartbeatTime: "2023-12-13T10:40:52Z"
    lastTransitionTime: "2023-12-13T10:40:52Z"
    message: 'Component reconciliation failed: Deployment.apps "data-science-pipelines-operator-controller-manager"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"data-science-pipelines-operator",
      "app.kubernetes.io/part-of":"data-science-pipelines-operator", "app.opendatahub.io/data-science-pipelines-operator":"true"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable'
    reason: ReconcileFailed
    status: "False"
    type: data-science-pipelines-operatorReady
  - lastHeartbeatTime: "2023-12-13T10:40:45Z"
    lastTransitionTime: "2023-12-13T10:40:45Z"
    message: 'Component reconciliation failed: Deployment.apps "codeflare-operator-manager"
      is invalid: spec.selector: Invalid value: v1.LabelSelector{MatchLabels:map[string]string{"app.kubernetes.io/name":"codeflare-operator",
      "app.kubernetes.io/part-of":"codeflare", "app.opendatahub.io/codeflare":"true"},
      MatchExpressions:[]v1.LabelSelectorRequirement(nil)}: field is immutable'
    reason: ReconcileFailed
    status: "False"
    type: codeflareReady
  - lastHeartbeatTime: "2023-12-13T10:40:52Z"
    lastTransitionTime: "2023-12-13T10:40:52Z"
    message: Component is disabled
    reason: ReconcileInit
    status: Unknown
    type: kserveReady
  installedComponents:
    codeflare: true
    dashboard: true
    data-science-pipelines-operator: true
    kserve: false
    model-mesh: false
    ray: false
    trustyai: false
    workbenches: true
  phase: Ready
zdtsw commented 9 months ago

what is the "latest" referred here?

Frawless commented 9 months ago

@zdtsw latest available image with incubation tag

kornys commented 9 months ago

from: quay.io/opendatahub/opendatahub-operator:v2.4.0 to: quay.io/opendatahub/opendatahub-operator@sha256:2d569be2885ca4b5f83df6ddc1cf39052fb979ae11591f6be94863a9f9f9c789

zdtsw commented 9 months ago

i can confirm this is an issue, i will need to look into it.

zdtsw commented 9 months ago

the problem lies in the logic of operator we did not handle it for upstream upgrade from 2.4 to later version

zdtsw commented 9 months ago

to capture underlying problem: we have the selector change introduced in ODH v2.3 which should make a smooth upgrade from v2.3 to v2.4 but then we changed the selector again in post-v2.4 to match the logic from downstream v2.4 so now the gap is between ODH v2.4 to post-v2.4 and the logic cannot be easily re-use in ODH, so we will need another way to automatically perform upgrade or we ask user to set all components to Removed before upgrade to post-2.4

kornys commented 9 months ago

Customer will not change state to removed Im sure, so this has to be fixed in operator logic from my POV

kornys commented 9 months ago

just to explain another thing -> this upgrade worked 2days back. So some of the yesterday commit did the break.

zdtsw commented 9 months ago

just to explain another thing -> this upgrade worked 2days back. So some of the yesterday commit did the break.

we backported the code from downstream to upstream yesterday morning

kornys commented 9 months ago

yup I noticed that

kornys commented 9 months ago

@zdtsw this should be also migrated to jira right?

zdtsw commented 9 months ago

@zdtsw this should be also migrated to jira right?

yes, please. we should use jira for new issue creation.

kornys commented 9 months ago

@zdtsw created: https://issues.redhat.com/browse/RHOAIENG-965

AjayJagan commented 9 months ago

since this is migrated to JIRA, I am closing it here :)