ibm-mas / ansible-devops

Ansible collection supporting devops for IBM Maximo Application Suite
https://ibm-mas.github.io/ansible-devops/
Eclipse Public License 2.0
49 stars 83 forks source link

oneclick_update does not update cp4d and services #1018

Open aeisma opened 1 year ago

aeisma commented 1 year ago

When updating our environment from MAS 8.10.3 Manage/IoT/Monitor/Predict and CP4D 4.6.4 to MAS 8.10.4 using the oneclick_update , CP4D is not updated. The oneclick_update updates the IBM catalog, resulting in updates to MAS and common services, but CP4D zen, ccs, ws and wml do not get updated automatically.

I managed to update CP4D to 4.6.6, but had to use the cpd-cli to complete it successfully.

At first I tried to update CP4D to 4.6.6 by running the ansible cp4d and cp4d_service roles for wsl and wml, which updates the version numbers in the operand requests. The zen and ws updates reconciled successfully, but ccs and wml did not. For example, the wml-cs reconciliation fails with: The conditional check ‘( etcd_statefulset.result.status.replicas == 3 )’ failed. The error was: error while evaluating conditional (( etcd_statefulset.result.status.replicas == 3 )): ‘dict object’ has no attribute ‘result’ The etcd_statefulset the operator is checking is running fine and it is not clear why this check is failing.

To try to resolve this I ran cpd-cli manage apply-olm/apply-cr --release=${VERSION} --components=ccs,ws,wml ... --upgrade commands and the reconciliations completed successfully. The cpd-cli seems to do a lot more than just change the version numbers in the the operand requests.

andrercm commented 1 year ago

@aeisma oneclick_update playbook only updates the catalog source, which, as you said, will trigger the update for ibm-common-services and other MAS dependencies because most of these dependencies can be updated while using the same subscription channel so in these cases just the catalog source update is enough, however for Cloud Pak for Data the upgrade process is different and bit more complex. It requires subscription channel to be changed and the new versions to be set explicitly in each CPD resource instance. To facilitate the CPD upgrade process, we abstracted the complexity of this process into the cp4d and cp4d_service roles in a way that you just need to set the CPD_PRODUCT_VERSION to the newer version you want to upgrade and rerun the cp4d playbook on top of your existing CPD deployment to trigger the upgrade process. That will take care of upgrading all the expected CPD related subscription channels and set its services to the desired version.

aeisma commented 1 year ago

Hi Andre,

Yes, the procedure you describe is what we followed and in theory it should work; it does work for the ZenServic/lite-cr and the WS/ws-cr but it does not work for CCS/ccs-cr and the WB/wml-cr. The ccs-cr and wml-cr get stuck in an error for the update from CP4D 4.6.4 to 4.6.6 and do not become completed (see the conditional check failure for the wml-cr above). As I explained above, a cpd-cli manage apply-olm/apply-cr --release=${VERSION} --components=ccs,ws,wml clearly seems to trigger the operators to run quite a few additional steps and the ccs-cr and wml-cr reconcile successfully.

This confirms the statement in https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=eiu-updating-olm-objects-1 : It is strongly recommended that you use this approach to update the OLM objects to ensure that any required cleanup actions are performed. If you attempt to update the OLM objects for individual components, you might encounter errors when you upgrade the software.

andrercm commented 12 months ago

It seems to be you were having an issue in CCS custom resource, which is a dependency for both WSL and WML. If by any chance you still have the environment with this problem, would you be able to provide the outputs for each of the following commands ?

oc get ccs ccs-cr -o yaml -n ibm-cpd
oc get ws ws-cr -o yaml -n ibm-cpd
oc get wmlbase wml-cr -o yaml -n ibm-cpd
oc get pods -n ibm-cpd

In any case, I'll give a try to upgrade CPD 4.6.4 > 4.6.6 services using the automation and validate if there are any problems along the way.

aeisma commented 12 months ago

After fixing the CCS problem we ran into the WML problem next. They did not appear to be related. We cannot afford any delays and we've upgraded all environments using the cpd-cli of CP4D 4.6.6, so I cannot share ahy of the CRs.

andrercm commented 12 months ago
The zen and ws updates reconciled successfully, but ccs and wml did not.

Have you attempted to upgrade all at once? Or did you follow a sequence of events to do that. I'll try to reproduce exactly what you've done, because we're not able to reproduce it issue you're seeing. Something seems out of order in your attempt because WS would not be upgraded successfully if CSS was not upgraded prior to that as the latter is a dependency for WS and WML. We have recently upgraded the integration test environments using these roles, although in our case since ccs is a dependency for ws which is a dependency for wml, we followed this order while upgrading:

  1. Upgrade CPD/Zen
  2. When 1# completed successfully, upgrade WS (this will also upgrade CCS as part of the process)
  3. When 2# completed successfully, upgrade WML
aeisma commented 12 months ago

I ran the update in sequence as follows:

export COMMON_SERVICES_CHANNEL=v3.23
export CPD_PRODUCT_VERSION=4.6.6 # from 4.6.4
ROLE_NAME=cp4d ansible-playbook ibm.mas_devops.run_role # Upgrade of CPD/Zen completed successfully.
export CPD_SERVICE_NAME=wsl
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # Got stuck on CCS problem and timed out.
export CPD_SERVICE_NAME=wml
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # Got stuck on etcd problem.
andrercm commented 11 months ago

@aeisma have you exported MAS_CATALOG_VERSION variable while running the CPD install/upgrade? I will try to reproduce your issue but would need to try to understand all the variables you used to mirror the issue.

lokesh-sreedhara commented 11 months ago

was this role successfull before running wsl and wml?

ROLE_NAME='cp4d' ansible-playbook playbooks/run_role.yml
aeisma commented 11 months ago

Yes, export MAS_CATALOG_VERSION is defined. cp4d role completed successfully, resulting in updated Zen.

andrercm commented 11 months ago

@aeisma can you list all vars you have exported including the value for MAS_CATALOG_VERSION ? Also, can you confirm me your openshift version?

andrercm commented 11 months ago

@aeisma I just tested the upgrade from CPD 4.6.4 to CPD 4.6.6 in an IBM Cloud Openshift cluster and it worked as expected. Here's what I did:

  1. Installed CPD 4.6.4 + WSL + WML, one by one:

    export CPD_PRODUCT_VERSION='4.6.4'
    ROLE_NAME='cp4d' ansible-playbook playbooks/run_role.yml
    ROLE_NAME='cp4d_service' CPD_SERVICE_NAME='wsl' ansible-playbook playbooks/run_role.yml
    ROLE_NAME='cp4d_service' CPD_SERVICE_NAME='wml' ansible-playbook playbooks/run_role.yml
  2. Updated the MAS_CATALOG_VERSION to v8-230926-amd64 - this is the static catalog source released in September which introduces support to CPD 4.6.6 along with MAS 8.10.5 (MAS 8.10.4 is supposed to be using CPD 4.6.4 still - @sanju7216 keep me sane in my statement please) - You can update your catalog using the oneclick_update playbook or just delete the existing ibm-operator-catalog catalog source and import the following yaml in your openshift cluster:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: ibm-operator-catalog
  namespace: openshift-marketplace
spec:
  description: Static Catalog Source for IBM Maximo Application Suite
  displayName: IBM Maximo Operators (v8-230926-amd64)
  image: >-
    icr.io/cpopen/ibm-maximo-operator-catalog@sha256:b3ad0d8d20eee9c7e48ba93b956a4f452e48ba0a648e76c39100c352f2cb6537
  priority: 90
  publisher: IBM
  sourceType: grpc

The update of the catalog source will also automatically update all operands used by CPD in ibm-common-services namespace, such as zen-operator and foundation services.

  1. Then, I ran the CPD upgrade exactly as you stated:
    export COMMON_SERVICES_CHANNEL=v3.23 # that's actually not needed, a default channel for common services will be set based on the installed ibm-operator-catalog catalog source in your cluster
    export CPD_PRODUCT_VERSION=4.6.6 # from 4.6.4
    ROLE_NAME=cp4d ansible-playbook ibm.mas_devops.run_role # Upgrade of CPD/Zen completed successfully.
    export CPD_SERVICE_NAME=wsl
    ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # CCS/WSL upgraded successfully.
    export CPD_SERVICE_NAME=wml
    ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # WML upgraded successfully.

Without further outputs/logs from your environment that had issues will not be possible to diagnose the root cause however I tend to believe that the problem you faced has something to do with a potential old/not compatible version of foundation services/common services and CPD 4.6.6. When you run cpd-cli command, this uses specific custom catalog sources (which are different than the ones supported by MAS) therefore I believe that this have forced your IBM common services and/or CPD subscriptions to be upgraded and that's why you ended up fixing the problem one way or another.

Next time you install or upgrade CPD to 4.6.6, please make sure you have installed the ibm-operator-catalog version v8-230926-amd64 or later prior running the automation. If even so you still end up having issues, contact me in slack @andrercm so we can check what's happening and have a proper investigation on the cluster.

andrercm commented 11 months ago

Here are the output for the cp4d-service role while upgrading wsl and wml, both ended up successfully:

cp4d-service-role-wml-output.log cp4d-service-role-wsl-output.log

wsl upgraded to 6.5.0 (corresponding to cpd 4.6.6 release):

andremarcelino@MacBook-Pro-de-Andre ~> oc get ws ws-cr -o yaml -n ibm-cpd

apiVersion: ws.cpd.ibm.com/v1beta1
kind: WS
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"ws.cpd.ibm.com/v1beta1","kind":"WS","metadata":{"name":"ws-cr","namespace":"ibm-cpd"},"spec":{"blockStorageClass":"ibmc-block-gold","ccs_operand_version":"6.5.0","datarefinery_operand_version":"6.5.0","ignoreForMaintenance":false,"license":{"accept":true,"license":"Standard"},"scaleConfig":"small","storageClass":"ibmc-file-gold-gid","version":"6.5.0","wsrt_operand_version":"6.5.0"}}'
  creationTimestamp: "2023-10-13T17:43:22Z"
  generation: 2
  name: ws-cr
  namespace: ibm-cpd
  resourceVersion: "600098"
  uid: 423082cb-cf2b-4459-aa3b-fbfa97157260
spec:
  blockStorageClass: ibmc-block-gold
  ccs_operand_version: 6.5.0
  datarefinery_operand_version: 6.5.0
  ignoreForMaintenance: false
  license:
    accept: true
    license: Standard
  scaleConfig: small
  storageClass: ibmc-file-gold-gid
  version: 6.5.0
  wsrt_operand_version: 6.5.0
status:
  conditions:
  - lastTransitionTime: "2023-10-13T18:35:23Z"
    message: ""
    reason: ""
    status: "False"
    type: Failure
  - ansibleResult:
      changed: 17
      completion: 2023-10-14T00:39:36.928659
      failures: 0
      ok: 119
      skipped: 58
    lastTransitionTime: "2023-10-13T18:35:23Z"
    message: Awaiting next reconciliation
    reason: Successful
    status: "True"
    type: Running
  - lastTransitionTime: "2023-10-14T00:39:37Z"
    message: Last reconciliation succeeded
    reason: Successful
    status: "True"
    type: Successful
  type: Ready
  versions:
    reconciled: 6.5.0
  wsBuildNumber: 20
  wsStatus: Completed

wml upgraded to 4.6.5 version (corresponding to cpd 4.6.6 release)

andremarcelino@MacBook-Pro-de-Andre ~> oc get wmlbase wml-cr -o yaml -n ibm-cpd

apiVersion: wml.cpd.ibm.com/v1beta1
kind: WmlBase
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"wml.cpd.ibm.com/v1beta1","kind":"WmlBase","metadata":{"name":"wml-cr","namespace":"ibm-cpd"},"spec":{"blockStorageClass":"ibmc-block-gold","ccs_operand_version":"6.5.0","ignoreForMaintenance":false,"license":{"accept":true,"license":"Standard"},"scaleConfig":"small","storageClass":"ibmc-file-gold-gid","version":"4.6.5"}}'
  creationTimestamp: "2023-10-13T20:26:31Z"
  finalizers:
  - wml.cpd.ibm.com/finalizer
  generation: 2
  name: wml-cr
  namespace: ibm-cpd
  resourceVersion: "652396"
  uid: 01f8cbe3-6564-4979-967c-065f33cedc96
spec:
  blockStorageClass: ibmc-block-gold
  ccs_operand_version: 6.5.0
  ignoreForMaintenance: false
  license:
    accept: true
    license: Standard
  scaleConfig: small
  storageClass: ibmc-file-gold-gid
  version: 4.6.5
status:
  buildNumber: 4.6.5-4816
  conditions:
  - ansibleResult:
      changed: 17
      completion: 2023-10-14T01:35:59.549616
      failures: 0
      ok: 205
      skipped: 48
    lastTransitionTime: "2023-10-14T01:10:10Z"
    message: Awaiting next reconciliation
    reason: Successful
    status: "True"
    type: Running
  - lastTransitionTime: "2023-10-14T01:36:00Z"
    message: Last reconciliation succeeded
    reason: Successful
    status: "True"
    type: Successful
  - lastTransitionTime: "2023-10-14T01:10:10Z"
    message: ""
    reason: ""
    status: "False"
    type: Failure
  versions:
    reconciled: 4.6.5
  wmlStatus: Completed
andrercm commented 11 months ago

@aeisma any updates on this case?

aeisma commented 11 months ago

We followed the same steps you did, except that the MAS_CATALOG_VERSION available and used at the time of opening this case was v8-230829-amd64. The Ansible DevOps release available at that time already supported CPD_PRODUCT_VERSION=4.6.6. I am not sure if the problem had anything to do with the catalog version, but you've shown that it does work with v8-230926-amd64.