Open aeisma opened 1 year ago
@aeisma oneclick_update
playbook only updates the catalog source, which, as you said, will trigger the update for ibm-common-services and other MAS dependencies because most of these dependencies can be updated while using the same subscription channel so in these cases just the catalog source update is enough, however for Cloud Pak for Data the upgrade process is different and bit more complex. It requires subscription channel to be changed and the new versions to be set explicitly in each CPD resource instance. To facilitate the CPD upgrade process, we abstracted the complexity of this process into the cp4d
and cp4d_service
roles in a way that you just need to set the CPD_PRODUCT_VERSION
to the newer version you want to upgrade and rerun the cp4d
playbook on top of your existing CPD deployment to trigger the upgrade process. That will take care of upgrading all the expected CPD related subscription channels and set its services to the desired version.
Hi Andre,
Yes, the procedure you describe is what we followed and in theory it should work; it does work for the ZenServic/lite-cr and the WS/ws-cr but it does not work for CCS/ccs-cr and the WB/wml-cr. The ccs-cr and wml-cr get stuck in an error for the update from CP4D 4.6.4 to 4.6.6 and do not become completed (see the conditional check failure for the wml-cr above).
As I explained above, a cpd-cli manage apply-olm/apply-cr --release=${VERSION} --components=ccs,ws,wml
clearly seems to trigger the operators to run quite a few additional steps and the ccs-cr and wml-cr reconcile successfully.
This confirms the statement in https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=eiu-updating-olm-objects-1 : It is strongly recommended that you use this approach to update the OLM objects to ensure that any required cleanup actions are performed. If you attempt to update the OLM objects for individual components, you might encounter errors when you upgrade the software.
It seems to be you were having an issue in CCS
custom resource, which is a dependency for both WSL and WML.
If by any chance you still have the environment with this problem, would you be able to provide the outputs for each of the following commands ?
oc get ccs ccs-cr -o yaml -n ibm-cpd
oc get ws ws-cr -o yaml -n ibm-cpd
oc get wmlbase wml-cr -o yaml -n ibm-cpd
oc get pods -n ibm-cpd
In any case, I'll give a try to upgrade CPD 4.6.4 > 4.6.6 services using the automation and validate if there are any problems along the way.
After fixing the CCS problem we ran into the WML problem next. They did not appear to be related. We cannot afford any delays and we've upgraded all environments using the cpd-cli of CP4D 4.6.6, so I cannot share ahy of the CRs.
The zen and ws updates reconciled successfully, but ccs and wml did not.
Have you attempted to upgrade all at once? Or did you follow a sequence of events to do that. I'll try to reproduce exactly what you've done, because we're not able to reproduce it issue you're seeing. Something seems out of order in your attempt because WS would not be upgraded successfully if CSS was not upgraded prior to that as the latter is a dependency for WS and WML. We have recently upgraded the integration test environments using these roles, although in our case since ccs is a dependency for ws which is a dependency for wml, we followed this order while upgrading:
I ran the update in sequence as follows:
export COMMON_SERVICES_CHANNEL=v3.23
export CPD_PRODUCT_VERSION=4.6.6 # from 4.6.4
ROLE_NAME=cp4d ansible-playbook ibm.mas_devops.run_role # Upgrade of CPD/Zen completed successfully.
export CPD_SERVICE_NAME=wsl
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # Got stuck on CCS problem and timed out.
export CPD_SERVICE_NAME=wml
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # Got stuck on etcd problem.
@aeisma have you exported MAS_CATALOG_VERSION
variable while running the CPD install/upgrade? I will try to reproduce your issue but would need to try to understand all the variables you used to mirror the issue.
was this role successfull before running wsl and wml?
ROLE_NAME='cp4d' ansible-playbook playbooks/run_role.yml
Yes, export MAS_CATALOG_VERSION is defined. cp4d role completed successfully, resulting in updated Zen.
@aeisma can you list all vars you have exported including the value for MAS_CATALOG_VERSION
? Also, can you confirm me your openshift version?
@aeisma I just tested the upgrade from CPD 4.6.4 to CPD 4.6.6 in an IBM Cloud Openshift cluster and it worked as expected. Here's what I did:
Installed CPD 4.6.4 + WSL + WML, one by one:
export CPD_PRODUCT_VERSION='4.6.4'
ROLE_NAME='cp4d' ansible-playbook playbooks/run_role.yml
ROLE_NAME='cp4d_service' CPD_SERVICE_NAME='wsl' ansible-playbook playbooks/run_role.yml
ROLE_NAME='cp4d_service' CPD_SERVICE_NAME='wml' ansible-playbook playbooks/run_role.yml
Updated the MAS_CATALOG_VERSION
to v8-230926-amd64
- this is the static catalog source released in September which introduces support to CPD 4.6.6 along with MAS 8.10.5 (MAS 8.10.4 is supposed to be using CPD 4.6.4 still - @sanju7216 keep me sane in my statement please) - You can update your catalog using the oneclick_update
playbook or just delete the existing ibm-operator-catalog
catalog source and import the following yaml in your openshift cluster:
apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
name: ibm-operator-catalog
namespace: openshift-marketplace
spec:
description: Static Catalog Source for IBM Maximo Application Suite
displayName: IBM Maximo Operators (v8-230926-amd64)
image: >-
icr.io/cpopen/ibm-maximo-operator-catalog@sha256:b3ad0d8d20eee9c7e48ba93b956a4f452e48ba0a648e76c39100c352f2cb6537
priority: 90
publisher: IBM
sourceType: grpc
The update of the catalog source will also automatically update all operands used by CPD in ibm-common-services namespace, such as zen-operator and foundation services.
export COMMON_SERVICES_CHANNEL=v3.23 # that's actually not needed, a default channel for common services will be set based on the installed ibm-operator-catalog catalog source in your cluster
export CPD_PRODUCT_VERSION=4.6.6 # from 4.6.4
ROLE_NAME=cp4d ansible-playbook ibm.mas_devops.run_role # Upgrade of CPD/Zen completed successfully.
export CPD_SERVICE_NAME=wsl
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # CCS/WSL upgraded successfully.
export CPD_SERVICE_NAME=wml
ROLE_NAME=cp4d_service ansible-playbook ibm.mas_devops.run_role # WML upgraded successfully.
Without further outputs/logs from your environment that had issues will not be possible to diagnose the root cause however I tend to believe that the problem you faced has something to do with a potential old/not compatible version of foundation services/common services and CPD 4.6.6. When you run cpd-cli command, this uses specific custom catalog sources (which are different than the ones supported by MAS) therefore I believe that this have forced your IBM common services and/or CPD subscriptions to be upgraded and that's why you ended up fixing the problem one way or another.
Next time you install or upgrade CPD to 4.6.6, please make sure you have installed the ibm-operator-catalog version v8-230926-amd64
or later prior running the automation. If even so you still end up having issues, contact me in slack @andrercm so we can check what's happening and have a proper investigation on the cluster.
Here are the output for the cp4d-service
role while upgrading wsl and wml, both ended up successfully:
cp4d-service-role-wml-output.log cp4d-service-role-wsl-output.log
wsl upgraded to 6.5.0 (corresponding to cpd 4.6.6 release):
andremarcelino@MacBook-Pro-de-Andre ~> oc get ws ws-cr -o yaml -n ibm-cpd
apiVersion: ws.cpd.ibm.com/v1beta1
kind: WS
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"ws.cpd.ibm.com/v1beta1","kind":"WS","metadata":{"name":"ws-cr","namespace":"ibm-cpd"},"spec":{"blockStorageClass":"ibmc-block-gold","ccs_operand_version":"6.5.0","datarefinery_operand_version":"6.5.0","ignoreForMaintenance":false,"license":{"accept":true,"license":"Standard"},"scaleConfig":"small","storageClass":"ibmc-file-gold-gid","version":"6.5.0","wsrt_operand_version":"6.5.0"}}'
creationTimestamp: "2023-10-13T17:43:22Z"
generation: 2
name: ws-cr
namespace: ibm-cpd
resourceVersion: "600098"
uid: 423082cb-cf2b-4459-aa3b-fbfa97157260
spec:
blockStorageClass: ibmc-block-gold
ccs_operand_version: 6.5.0
datarefinery_operand_version: 6.5.0
ignoreForMaintenance: false
license:
accept: true
license: Standard
scaleConfig: small
storageClass: ibmc-file-gold-gid
version: 6.5.0
wsrt_operand_version: 6.5.0
status:
conditions:
- lastTransitionTime: "2023-10-13T18:35:23Z"
message: ""
reason: ""
status: "False"
type: Failure
- ansibleResult:
changed: 17
completion: 2023-10-14T00:39:36.928659
failures: 0
ok: 119
skipped: 58
lastTransitionTime: "2023-10-13T18:35:23Z"
message: Awaiting next reconciliation
reason: Successful
status: "True"
type: Running
- lastTransitionTime: "2023-10-14T00:39:37Z"
message: Last reconciliation succeeded
reason: Successful
status: "True"
type: Successful
type: Ready
versions:
reconciled: 6.5.0
wsBuildNumber: 20
wsStatus: Completed
wml upgraded to 4.6.5 version (corresponding to cpd 4.6.6 release)
andremarcelino@MacBook-Pro-de-Andre ~> oc get wmlbase wml-cr -o yaml -n ibm-cpd
apiVersion: wml.cpd.ibm.com/v1beta1
kind: WmlBase
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"wml.cpd.ibm.com/v1beta1","kind":"WmlBase","metadata":{"name":"wml-cr","namespace":"ibm-cpd"},"spec":{"blockStorageClass":"ibmc-block-gold","ccs_operand_version":"6.5.0","ignoreForMaintenance":false,"license":{"accept":true,"license":"Standard"},"scaleConfig":"small","storageClass":"ibmc-file-gold-gid","version":"4.6.5"}}'
creationTimestamp: "2023-10-13T20:26:31Z"
finalizers:
- wml.cpd.ibm.com/finalizer
generation: 2
name: wml-cr
namespace: ibm-cpd
resourceVersion: "652396"
uid: 01f8cbe3-6564-4979-967c-065f33cedc96
spec:
blockStorageClass: ibmc-block-gold
ccs_operand_version: 6.5.0
ignoreForMaintenance: false
license:
accept: true
license: Standard
scaleConfig: small
storageClass: ibmc-file-gold-gid
version: 4.6.5
status:
buildNumber: 4.6.5-4816
conditions:
- ansibleResult:
changed: 17
completion: 2023-10-14T01:35:59.549616
failures: 0
ok: 205
skipped: 48
lastTransitionTime: "2023-10-14T01:10:10Z"
message: Awaiting next reconciliation
reason: Successful
status: "True"
type: Running
- lastTransitionTime: "2023-10-14T01:36:00Z"
message: Last reconciliation succeeded
reason: Successful
status: "True"
type: Successful
- lastTransitionTime: "2023-10-14T01:10:10Z"
message: ""
reason: ""
status: "False"
type: Failure
versions:
reconciled: 4.6.5
wmlStatus: Completed
@aeisma any updates on this case?
We followed the same steps you did, except that the MAS_CATALOG_VERSION available and used at the time of opening this case was v8-230829-amd64. The Ansible DevOps release available at that time already supported CPD_PRODUCT_VERSION=4.6.6. I am not sure if the problem had anything to do with the catalog version, but you've shown that it does work with v8-230926-amd64.
When updating our environment from MAS 8.10.3 Manage/IoT/Monitor/Predict and CP4D 4.6.4 to MAS 8.10.4 using the oneclick_update , CP4D is not updated. The oneclick_update updates the IBM catalog, resulting in updates to MAS and common services, but CP4D zen, ccs, ws and wml do not get updated automatically.
I managed to update CP4D to 4.6.6, but had to use the cpd-cli to complete it successfully.
At first I tried to update CP4D to 4.6.6 by running the ansible cp4d and cp4d_service roles for wsl and wml, which updates the version numbers in the operand requests. The zen and ws updates reconciled successfully, but ccs and wml did not. For example, the wml-cs reconciliation fails with: The conditional check ‘( etcd_statefulset.result.status.replicas == 3 )’ failed. The error was: error while evaluating conditional (( etcd_statefulset.result.status.replicas == 3 )): ‘dict object’ has no attribute ‘result’ The etcd_statefulset the operator is checking is running fine and it is not clear why this check is failing.
To try to resolve this I ran
cpd-cli manage apply-olm/apply-cr --release=${VERSION} --components=ccs,ws,wml ... --upgrade
commands and the reconciliations completed successfully. The cpd-cli seems to do a lot more than just change the version numbers in the the operand requests.