Closed lokeshs09 closed 2 years ago
Comments from Rahul
I think this cluster has resource issues
ignoreForMaintenance: true
zenCoreMetaDbStorageClass: ibmc-block-gold
Link to Slack conversation: https://ibm-watson-iot.slack.com/archives/C0195MVCEUD/p1648836704013219
I fixed the zenmetastoredb storage class to be ibmc-block-gold
but what seemed to have resolved in fact was to boost the default zen-metastoredb statefulset to use more mem/cpu:
https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/roles/cp4d_install/tasks/install/cpd40.yml#L123
I still have opened questions regarding the need to set the cpd installs to manual instead of automatic upgrades... this will likely cause troubles to more places in the ansible collection because if we set CPD to manual upgrades, all subscriptions under ibm-common-services will be forced to be manually managed as well.
Auto-upgrade does not affect the cp4d product version (4.x), only the operator versions are affected by OLM subscriptions so we can close this based on the work @andrercm has already performed.
Opening an issue here make the necessary fixes or enhancements to the cp4d playbooks on anisble-devops. The recent breakdown, blockers on IVT10 and IVT11 were caused by 2 main factors.
Details of the issue is been documented here: https://github.ibm.com/PrivateCloud-analytics/CPD-Quality/issues/2326
Copying Sriram's recommendations here:
Summary of the issues:
1) Performance issue: - suspected storage IOPs (with ibmc-file-gold-gid) or even perhaps connection leaks - while one cluster perfoms better after scale out/up, the second cluster still has problems (@rahul-shinge & @kvstumph reviewing the second cluster)
2) Issue with "ibm-operator-catalog" latest in use - this can cause uncontrolled automatic upgrades & will be hard for CPD to support if there are an arbitrary mix-n-match of versions (including Cloud Pak Foundational Services version)
(future) Action Items:
1) Right storage selection (especially on IBM Cloud) to improve reliability
When provisioning Ibmcpd CR - add
zenCoreMetadbStorageClass
to point to a block storage class2) Validating performance of the available storage classes CPD now also has tools (published to open source) to measure/benchmark target storage See: https://github.com/IBM/k8s-storage-perf
3) Freeze the CP4D version - so it does not get randomly upgraded whenever any refresh happens for stability - and important to be on a “validated” version combination
using fixed catalog sources (image digests) instead of ibm-operator-catalog https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=ccs-creating-catalog-sources-that-pull-specific-versions-images-from-entitled-registry
CPD is introducing additional automation to reduce complexity of installs and upgrades while pinning versions: https://github.ibm.com/PrivateCloud/olm-utils
4) Use LDAP/AD even for testing environments (or Cloud Pak IAM) to mimic “enterprise” security — the out-of-the-box placeholder is not secure enough or recommended for use. Once Authentication is configured, most customers turn off even the "admin" user: https://www.ibm.com/docs/en/cloud-paks/cp-data/4.0?topic=users-disabling-default-admin-user (using the out-of-the-box usermgmt is ok only for dev/test purposes)