ibm-mas / ansible-devops

Ansible collection supporting devops for IBM Maximo Application Suite
https://ibm-mas.github.io/ansible-devops/
Eclipse Public License 2.0
49 stars 89 forks source link

MAS Manage Install not completing on Fusion HCI #877

Open viveksharan opened 1 year ago

viveksharan commented 1 year ago

The MAS INstall using Ansible Scripts is not completing using the ansible Scripts . Refer the attached documents for details

Fusion HCI issue.docx

#!/usr/bin/bash
#----------------------------------------------
#▁ ▂ ▄ ▅ ▆ ▇ █ Fill in the blanks █ ▇ ▆ ▅ ▄ ▂ ▁
#----------------------------------------------
export IBM_ENTITLEMENT_KEY= <hidden>
export UDS_CONTACT_EMAIL=vivek.sharan@ibm.com
export UDS_CONTACT_FIRSTNAME=Vivek
export UDS_CONTACT_LASTNAME=Sharan
export SLS_LICENSE_ID=<hidden>
export MAS_APP_SETTINGS_JMS_QUEUE_PVC_STORAGE_CLASS=ibm-spectrum-scale-sc
export MAS_APP_SETTINGS_BIM_PVC_STORAGE_CLASS=ibm-spectrum-scale-sc
export MAS_APP_SETTINGS_DOCLINKS_PVC_STORAGE_CLASS=ibm-spectrum-scale-sc
#-------------------------------------------
#▁ ▂ ▄ ▅ ▆ ▇ █ Defaults / Core █ ▇ ▆ ▅ ▄ ▂ ▁
#-------------------------------------------
export MAS_CONFIG_DIR=/scripts
export SLS_MONGODB_CFG_FILE=/scripts
export MAS_INSTANCE_ID=maslab
export MAS_WORKSPACE_ID=masdemo
export SLS_DOMAIN=svc.cluster.local
export SLS_LICENSE_FILE=/scripts/license.dat
export MAS_ANNOTATIONS=mas.ibm.com/operationalMode=nonproduction
# export MONGODB_REPLICAS=1
#---------------------------------------------
#▁ ▂ ▄ ▅ ▆ ▇ █ Defaults / Manage █ ▇ ▆ ▅ ▄ ▂ ▁
#---------------------------------------------
export MAS_APPWS_COMPONENTS="base=latest,health=latest"
export MAS_APP_SETTINGS_DEMODATA=false
export MAS_APP_SETTINGS_PERSISTENT_VOLUMES_FLAG=true
export MAS_APP_SETTINGS_SERVER_BUNDLES_SIZE=jms

# export MAS_APP_SETTINGS_SERVER_BUNDLES_SIZE=dev
# export DB2_MEMORY_REQUESTS=28Gi
# export DB2_MEMORY_LIMITS=26Gi
# export MAS_APP_SETTINGS_DOCLINKS_PVC_STORAGE_CLASS=ocs-storagecluster-cephfs
# export MAS_APP_SETTINGS_BIM_PVC_STORAGE_CLASS=ocs-storagecluster-cephfs
# export MAS_APP_SETTINGS_SECONDARY_LANGS='FR,IT,DE,ZH-TW'
#----------------------------------------------
#▁ ▂ ▄ ▅ ▆ ▇ █ Defaults / Monitor █ ▇ ▆ ▅ ▄ ▂ ▁
#----------------------------------------------
export MAS_APP_SETTINGS_MONITOR_DEPLOYMENT_SIZE=dev
viveksharan commented 1 year ago

Fusion HCI issue.docx

andrercm commented 1 year ago

@viveksharan hello, two places you can check:

ManageWorkspace custom resource:

Go to Administration > Custom Resource Definition > ManageWorkspace > Instances > find your Manage instance. Then check in the status to see what's complaining about. If possible provide its YAML file so we can debug. Another place is to look at the Manage pods to see if they're all running. You can screenshot the list of pods in Manage namespace/project sorted by most recent created.

andrercm commented 1 year ago

Closing since no feedback, we can reopen if still needed.

viveksharan commented 1 year ago

TS013628780 https://www.ibm.com/mysupport/s/case/5003p00002ocHP5AAM/db2warehouse-not-getting-installed-on-fusion-hci-red-hat-open-shift-cluster?openCase=true&language=en_US

I am CSM for IBM. I am working with the Fusion HCI team to install DB2 Warehouse on a Fusion HCI Cluster.

Refer Attached Files for more info

The attached YAML is used to create the Cluster but it is stuck in Processing Status with the following information in the DB2Diag logs on the POD.

It maybe that the OOB Ansible Scripts are calculating the Memory too low and thats why the value of SHMALL is gettign set lower.

HOSTNAME: c-db2w-shared-db2u-0 EDUID : 1 EDUNAME: db2sysc 0 FUNCTION: DB2 UDB, routine_infrastructure, sqlerInitFmpHeap, probe:60 MESSAGE : ZRC=0x850F0005=-2062614523=SQLO_NOSEG "No Storage Available for allocation" DIA8305C Memory allocation failure occurred. DATA #1 : String, 284 bytes Failed to create the memory segment used for communication with fenced routines. If re-starting db2, ensure no db2fmp processes were on the instance prior to start. Otherwise, you can adjust this value through DB2_FMP_COMM_HEAPSZ db2set value, or by decreasing your ASLHEAPSZ setting. DATA #2 : unsigned integer, 8 bytes 127146307584 2023-07-10-11.57.22.867774+000 E151638E754 LEVEL: Warning PID : 2726395 TID : 139879648323328 PROC : db2sysc 0 INSTANCE: db2inst1 NODE : 000 HOSTNAME: c-db2w-shared-db2u-0 EDUID : 1 EDUNAME: db2sysc 0 FUNCTION: DB2 UDB, routine_infrastructure, sqlerInitFmpHeap, probe:60 MESSAGE : ADM11003E DB2 failed to create the memory segment used for communication with fenced routines. If restarting DB2, ensure that no db2fmp processes are active on the instance prior to start. Otherwise, you can adjust the value through the DB2_FMP_COMM_HEAPSZ registry variable, or you can decrease the value of ASLHEAPSZ in the database manager configuration.

viveksharan commented 1 year ago

apiVersion: db2u.databases.ibm.com/v1 kind: Db2uCluster metadata: annotations: db2u/certs-api-cert: '[secure]' db2u/certs-api-key: '[secure]' db2u/certs-wv-rest: '[secure]' db2u/license: '[secure]' db2u/sshkeys-db2instusr: '[secure]' db2u/sshkeys-db2uadm: '[secure]' db2u/sshkeys-db2uhausr: '[secure]' kubectl.kubernetes.io/last-applied-configuration: '{"apiVersion":"db2u.databases.ibm.com/v1","kind":"Db2uCluster","metadata":{"name":"db2w-shared","namespace":"db2u"},"spec":{"account":{"imagePullSecrets":["ibm-registry"],"privileged":true},"addOns":{"graph":{"enabled":false},"rest":{"enabled":false}},"environment":{"database":{"dbConfig":{"APPLHEAPSZ":"8192 AUTOMATIC"},"name":"BLUDB","settings":{"dftTableOrg":"ROW"},"ssl":{"certLabel":"CN=db2u","secretName":"db2u-certificate"}},"dbType":"db2wh","instance":{"dbmConfig":{"INSTANCE_MEMORY":"AUTOMATIC"},"registry":{"DB2AUTH":"OSAUTHDB,ALLOW_LOCAL_FALLBACK,PLUGIN_AUTO_RELOAD","DB2_4K_DEVICE_SUPPORT":"ON","DB2_FMP_RUN_AS_CONNECTED_USER":"NO","DB2_WORKLOAD":"ANALYTICS"}},"mln":{"total":1}},"license":{"accept":true},"podConfig":{"db2u":{"resource":{"db2u":{"limits":{"cpu":"3000m","memory":"12Gi"},"requests":{"cpu":"3000m","memory":"12Gi"}}}}},"size":1,"storage":[{"name":"meta","spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"ibm-spectrum-scale-sc"},"type":"create"},{"name":"data","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"ibm-spectrum-scale-sc"},"type":"template"},{"name":"backup","spec":{"accessModes":["ReadWriteMany"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"ibm-spectrum-scale-sc"},"type":"create"},{"name":"activelogs","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"ibm-spectrum-scale-sc"},"type":"template"},{"name":"tempts","spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"100Gi"}},"storageClassName":"ibm-spectrum-scale-sc"},"type":"template"}],"version":"s11.5.8.0"}}' creationTimestamp: "2023-06-13T21:48:23Z" generation: 25 labels: formation_id: db2w-shared name: db2w-shared namespace: db2u resourceVersion: "73522162" uid: 1779d44e-d768-41db-b83d-a104b38e5bf6 spec: account: imagePullSecrets:

viveksharan commented 1 year ago

SHMALL limits the total amount of virtual shared memory that can be allocated on a system, in your system you have:

db2supp_system.zip_unpack/OSCONFIG/sysctl_a.out:kernel.shmall = 8388608 It should be set to 2 * <size of RAM in the default system.

For some reason the Ansible Scripts are calculating too few memory as a result this parameter is too small.

andrercm commented 1 year ago

This seems to be related to DB2 enhancements already being worked as part of another issue we have. There is a pull request opened where Manage app related improvements for DB2 database will be made as per consulting with SRE DBAs: https://github.com/ibm-mas/ansible-devops/pull/917/files

@CAIOFCP @csschen can you add any further comments? This might be a duplicate issue.

csschen commented 1 year ago

@andrercm The parameter DB2_FMP_COMM_HEAPSZ mentioned on comment https://github.com/ibm-mas/ansible-devops/issues/877#issuecomment-1677540081 was covered by the reviewed DB2 parameters, but I'm not sure the value is suitable for this case. My suggestion is we can test after the PR 917 merged, we may need DBA help if the problem still exists.

andrercm commented 1 year ago

I agree @csschen, we may want to revalidate this issue once this other one that is already performing significant improvements in db2 for manage.

@viveksharan would you agree with the proposal?

There are changes to be implemented in https://github.com/ibm-mas/ansible-devops/pull/917 that may improve your experience (#607 has been opened to update the db2 settings for Manage as well). I would suggest us to close this issue to avoid having potential duplicates in our backlog, and if the proposed fix does not work for you, we can reopen this ticket for further investigation and possible involvement with dba team.

viveksharan commented 1 year ago

I am fine waiting for the new release and see if setting that parameter resolves the issue.

Just one highlight is that when I talked to the DB2 Team they told me that the parameter to change was on the OS Kernel side

SHMALL limits the total amount of virtual shared memory that can be allocated on a system, in your system you have:

db2supp_system.zip_unpack/OSCONFIG/sysctl_a.out:kernel.shmall = 8388608 It should be set to 2 * <size of RAM in the default system.