noobaa / noobaa-operator

Operator for NooBaa - object data service for hybrid and multi cloud environments :cloud: :wrench:
https://www.noobaa.io
Apache License 2.0
102 stars 99 forks source link

Manual steps to get noobaa updated from 5.13.2 to 5.15.2 #1388

Closed rijesh-purayil closed 2 months ago

rijesh-purayil commented 2 months ago

I want to know the manual steps to get the noobaa running on OpenShift updated.

Curent version - 5.13.2 (nooba-operator:5.13.2, nooba-core:5.13.2, postgresql-12-centos7) Desired version - 5.15.2 (nooba-operator:5.15.2, nooba-core:5.15.2, postgresql-12-rhel8)

Observation - Once we update the CRD and then update the nooba-operator to 5.15.2 , then the existing noobaa goes into Rejected state. What we expect is the noobaa should get into good state till the upgrade is complete. The document https://github.com/noobaa/noobaa-operator/blob/master/doc/noobaa-crd.md talks aboutIn any case when using custom images, you will have to make sure the operator and core images are compatible with each other.

Is that because the nooba-opertor:5.15.2 (new) is incompatible with nooba-core:5.13.2 (existing)?

In the Rejected state, then we update the noobaa CR with specific dbimage and nooba-core image, still its not able to recover from Rejected state. What we observe is that, the moment nooba-opertor updates to 5.15.2, then it detects pg upgrades and complains with Missing critical env variable for pg upgrade - NOOBAA_PSQL_12_IMAGE.

How to resolve this issue?

Nooba CR that is currently Running.

apiVersion: noobaa.io/v1alpha1
kind: NooBaa
metadata:
  generation: 2
  labels:
    app: noobaa
    app.kubernetes.io/instance: staging
    app.kubernetes.io/managed-by: ansible
    app.kubernetes.io/name: noobaa
    egress-label-noobaa: middleware-noobaa
  name: noobaa
  namespace: staging
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - preference:
          matchExpressions:
          - key: icp4data
            operator: NotIn
            values:
            - database-db2wh
        weight: 100
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: kubernetes.io/arch
            operator: In
            values:
            - amd64
  cleanupPolicy: {}
  coreResources:
    limits:
      cpu: 500m
      ephemeral-storage: 1Gi
      memory: 1Gi
    requests:
      cpu: 500m
      ephemeral-storage: 500Mi
      memory: 1Gi
  dbImage: xxxxxxxxxxxxxxxxxx/postgresql-12-centos7:69623db6c74ac2437a2f11c0733e38c4b8dbb6b1
  dbResources:
    limits:
      cpu: "1"
      ephemeral-storage: 1Gi
      memory: 2Gi
    requests:
      cpu: "1"
      ephemeral-storage: 1Gi
      memory: 2Gi
  dbStorageClass: rook-ceph-block
  dbType: postgres
  dbVolumeResources:
    requests:
      storage: 40Gi
  disableLoadBalancerService: true
  endpoints:
    maxCount: 4
    minCount: 2
    resources:
      limits:
        cpu: "1"
        ephemeral-storage: 1Gi
        memory: 2Gi
      requests:
        cpu: 500m
        ephemeral-storage: 100Mi
        memory: 500Mi
  image: xxxxxxxxxxxxxxxxxx/noobaa-core:5.13.2
  labels:
    core:
      app.kubernetes.io/instance: staging
      app.kubernetes.io/managed-by: ansible
      app.kubernetes.io/name: noobaa-cr
      egress-label-noobaa: middleware-noobaa
    db:
      app.kubernetes.io/instance: staging
      app.kubernetes.io/managed-by: ansible
      app.kubernetes.io/name: noobaa-cr
      egress-label-noobaa: middleware-noobaa
  loadBalancerSourceSubnets: {}
  manualDefaultBackingStore: true
  pvPoolDefaultStorageClass: rook-ceph-block
  security:
    kms: {}

Error

[root@api.giaas.cp.fyre.ibm.com ~]# oc get noobaa
NAME     S3-ENDPOINTS                                         STS-ENDPOINTS                                        SYSLOG-ENDPOINTS   IMAGE                                                                                               PHASE      AGE
noobaa   ["https://10.13.27.76:0/","https://10.13.27.227:0/"]   ["https://10.13.27.76:0/","https://10.13.27.227:0/"]                      xxxxxxxxxxxxxxxxxx/noobaa-core:5.15.2   Rejected   20h
[root@api.giaas.cp.fyre.ibm.com ~]#
oc descibe noobaa noobaa

...

..
  Upgrade Phase:  NoUpgrade
Events:
  Type     Reason              Age                From             Message
  ----     ------              ----               ----             -------
  Warning  MissingEnvVariable  13m (x3 over 31m)  noobaa-operator  Missing critical env variable for pg upgrade - NOOBAA_PSQL_12_IMAGE
[root@api.giaas.cp.fyre.ibm.com ~]# oc edit backingstore
Edit cancelled, no changes made.
[root@api.giaas.cp.fyre.ibm.com ~]#

Update steps followed.

  1. Applied updated CRDs, on the existing system where noobaa 5.13.2 is running. After that observed noobaa status still Ready.
  2. Update the noobaa operator image to 5.15.2 ; the operator pod got replaced.
  3. Observed noobaa went into Rejected state, then describing it Missing critical env variable for pg upgrade - NOOBAA_PSQL_12_IMAGE
liranmauda commented 2 months ago

HI @rijesh-purayil Noobaa is not supporting upgrade between n to n+2 Upgrading from 5.13.z to 5.15.z might (will) not work. You first need to upgrade from 5.13.z to 5.14.z, then to 5.15.z If you then encounter an issue, let us know.

rijesh-purayil commented 2 months ago

@liranmauda - We want to upgrade noobaa from 5.13.2 (operator and core) by keeping existing Postgres 12 version, but to replace postgresql-12-centos7 to postgresql-12-rhel8. We found a way to get this working. The issues we identified are:-

  1. time="2024-07-08T13:50:20Z" level=error msg="UpgradePostgresDB: Missing critical env variable for upgrade - NOOBAA_PSQL_12_IMAGE" sys=staging/noobaa - We need to set NOOBAA_PSQL_12_IMAGE for the noobaa operator.
  2. horizontalpodautoscalers.autoscaling \"noobaa-endpoint\" is forbidden: User "system:serviceaccount:staging:noobaa\" cannot delete resource \"horizontalpodautoscalers\" in API group \"autoscaling\" in the namespace \"staging\"\n" - Need to set additional permission or delete the hpa noobaa-endpoint.
  3. time="2024-07-05T10:16:41Z" level=info msg="UpgradePostgresDB: found ENV of pgsql version 12: xxxxxxxxxxxxxxx/postgresql-12-rhel8@sha256:457d3deab8a0e854c8d23d7a724a81c9f480b75197b6f515b9972290a3af13a9" sys=staging/noobaa time="2024-07-05T10:16:41Z" level=info msg="UpgradePostgresDB: NooBaa CR DB image: xxxxxxxxxxxxxxxxxx/postgresql-12-centos7:69623db6c74ac2437a2f11c0733e38c4b8dbb6b1 and operator DB image: are not the same, waiting..." sys=staging/noobaa

For 1, we added ENV NOOBAA_PSQL_12_IMAGE by patching noobaa operator. For 2, We added additional permission For 3, We patch existing nooba to have postgresUpdatePhase: NoUpgrade .

liranmauda commented 2 months ago

@rijesh-purayil Why do you want to keep postgress 12 and not move to postgress 15? @dannyzaken @jackyalbo is postgress 12 supported on 5.15?

rijesh-purayil commented 2 months ago

@liranmauda - We already use another PostgreSQL version 12 (Stolon) for a few microservices, and for Noobaa, we have this separate PostgreSQL version 12 in place. Next, we want to use Noobaa to use PostgreSQL v12 (Stolon) as an external database. The Postgres version upgrade can then go later, along with the services, which we will plan.

dannyzaken commented 2 months ago

For now we don't use any Postgresql-15 only features, so Postgres-12 is still supported. There is no guarantee that this will remain the case in the future, and we don't run any tests to verify the compatibility with Postgres 12

rijesh-purayil commented 2 months ago

@dannyzaken - Thanks.