ibm-mas / ansible-devops

Ansible collection supporting devops for IBM Maximo Application Suite
https://ibm-mas.github.io/ansible-devops/
Eclipse Public License 2.0
49 stars 86 forks source link

MongoDB role to upgrade ver. 5.0.x #1152

Closed puchakayalals closed 9 months ago

puchakayalals commented 10 months ago

Hello, we have MAS installed on our clusters with MongoDB community 4.2.x and 4.4.x. While trying to upgrade to version 5.0.x the Ansible roles says successfully completed but the Mongo CRD is still showing 4.4.21 version in yaml. Let me if we are doing something wrong or need to add additional parameters?

These are the Ansible vars being passed:

script: export MAS_CONFIG_DIR=$MAS_CONFIG_DIR
mkdir -p $MAS_CONFIG_DIR echo $MAS_CONFIG_DIR export MAS_INSTANCE_ID=$MAS_INSTANCE_ID export MONGO_V5_UPGRADE=true export MAS_CATALOG_VERSION=$MAS_CATALOG_VERSION export MONGODB_V5_UPGRADE=true oc login $MAS_OPENSHIFT_CLUSTER_HOST:$MAS_OPENSHIFT_CLUSTER_PORT --token=$SA_MAS_IAC_TOKEN --insecure-skip-tls-verify ansible-playbook ibm.mas_devops.deploy_mongodb when: manual

Attached the output log of the Ansible role for MongoDB: job#205904.txt

Here is the MongoDBCommunity CRD yaml content: apiVersion: mongodbcommunity.mongodb.com/v1 kind: MongoDBCommunity metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: >- {"apiVersion":"mongodbcommunity.mongodb.com/v1","kind":"MongoDBCommunity","metadata":{"name":"mas-mongo-ce","namespace":"mongoce"},"spec":{"additionalMongodConfig":{"net.tls.allowInvalidCertificates":true,"net.tls.allowInvalidHostnames":true,"storage.wiredTiger.engineConfig.journalCompressor":"snappy"},"members":3,"prometheus":{"passwordSecretRef":{"name":"mas-mongo-ce-metrics-endpoint-secret"},"username":"metrics-endpoint-user"},"security":{"authentication":{"modes":["SCRAM-SHA-256","SCRAM-SHA-1"]},"tls":{"caConfigMapRef":{"name":"mas-mongo-ce-cert-map"},"certificateKeySecretRef":{"name":"mongo-server-cert"},"enabled":true}},"statefulSet":{"spec":{"selector":{},"serviceName":"mas-mongo-ce-svc","template":{"spec":{"containers":[{"image":"quay.io/ibmmas/mongo@sha256:e1b43604ed1b54804f15c421de666e8be6c6d1ebf57e5825dff22493f9bd5f1e","name":"mongod","resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}}]}},"volumeClaimTemplates":[{"metadata":{"name":"data-volume"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"azure-disk-prm-lrs-csi"}},{"metadata":{"name":"logs-volume"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"azure-disk-prm-lrs-csi"}}]}},"type":"ReplicaSet","users":[{"db":"admin","name":"admin","passwordSecretRef":{"name":"mas-mongo-ce-admin-password"},"roles":[{"db":"admin","name":"clusterAdmin"},{"db":"admin","name":"userAdminAnyDatabase"},{"db":"admin","name":"dbOwner"},{"db":"admin","name":"readWriteAnyDatabase"}],"scramCredentialsSecretName":"mas-mongo-ce-scram"}],"version":"4.4.21"}} mongodb.com/v1.lastAppliedMongoDBVersion: 4.4.21 mongodb.com/v1.lastSuccessfulConfiguration: >- {"members":3,"type":"ReplicaSet","version":"4.4.21","arbiters":0,"security":{"authentication":{"modes":["SCRAM-SHA-256","SCRAM-SHA-1"],"ignoreUnknownUsers":true},"tls":{"enabled":true,"optional":false,"certificateKeySecretRef":{"name":"mongo-server-cert"},"caConfigMapRef":{"name":"mas-mongo-ce-cert-map"}}},"users":[{"name":"admin","db":"admin","passwordSecretRef":{"name":"mas-mongo-ce-admin-password","key":""},"roles":[{"db":"admin","name":"clusterAdmin"},{"db":"admin","name":"userAdminAnyDatabase"},{"db":"admin","name":"dbOwner"},{"db":"admin","name":"readWriteAnyDatabase"}],"scramCredentialsSecretName":"mas-mongo-ce-scram","connectionStringSecretName":""}],"statefulSet":{"spec":{}},"agent":{"logLevel":"","maxLogFileDurationHours":0},"additionalMongodConfig":{},"prometheus":{"username":"metrics-endpoint-user","passwordSecretRef":{"name":"mas-mongo-ce-metrics-endpoint-secret","key":""},"tlsSecretKeyRef":{"name":"","key":""}}} resourceVersion: '220512238' name: mas-mongo-ce uid: 83ca95ee-5500-47bb-8a75-725cec84997c creationTimestamp: '2023-12-07T18:49:33Z' generation: 1 managedFields:

andrercm commented 9 months ago

Hi @puchakayalals so from your logs it seems you are using MAS_CATALOG_VERSION=v8-231031-amd64

TASK [ibm.mas_devops.mongodb : Catalog Version] ********************************
ok: [localhost] => {
    "msg": [
        "Catalog Version ............................ v8-231031-amd64"
    ]
}

ok: [localhost] => {
    "msg": [
        "Mongo Version ............................ 4.4.21"
    ]
}

The default mongodb_version used for this catalog version is 4.4.21 as defined here: https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/common_vars/casebundles/v8-231031-amd64.yml#L79

So in order to upgrade to mongodb, you have two options... you can either update your MAS_CATALOG_VERSION to v8-231128-amd64 which uses mongodb_version=5.0.21 as defined here: https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/common_vars/casebundles/v8-231128-amd64.yml#L79

This option would be recommended if you also plan to update not only MongoDB but other MAS related components such as MAS core or MAS apps to a newer patch version.

Or you can add export MONGODB_VERSION=5.0.21 which will tell the automation that you explicitly want to install/upgrade to this speficic mongodb version without depending on the catalog version you have. This would be recommended if you just really care about upgrading MongoDB but don't want to update other MAS components/dependencies.

puchakayalals commented 9 months ago

Hello @andrercm setting MAS_CATALOG_VERSION to v8-231128-amd64 kicked off the MongoDB role to upgrade to 5.0.21., but the stateful set pods are stuck at 1 of 3 for more than 80 minutes.

MongoDBCommunity instance status of mas-mongo-ce is in Pending

Attached the mongo role output log: job#206541.txt

And here is CRD yaml:

apiVersion: mongodbcommunity.mongodb.com/v1 kind: MongoDBCommunity metadata: annotations: kubectl.kubernetes.io/last-applied-configuration: >- {"apiVersion":"mongodbcommunity.mongodb.com/v1","kind":"MongoDBCommunity","metadata":{"name":"mas-mongo-ce","namespace":"mongoce"},"spec":{"additionalMongodConfig":{"net.tls.allowInvalidCertificates":true,"net.tls.allowInvalidHostnames":true,"storage.wiredTiger.engineConfig.journalCompressor":"snappy"},"featureCompatibilityVersion":"5.0","members":3,"prometheus":{"passwordSecretRef":{"name":"mas-mongo-ce-metrics-endpoint-secret"},"username":"metrics-endpoint-user"},"security":{"authentication":{"modes":["SCRAM-SHA-256","SCRAM-SHA-1"]},"tls":{"caConfigMapRef":{"name":"mas-mongo-ce-cert-map"},"certificateKeySecretRef":{"name":"mongo-server-cert"},"enabled":true}},"statefulSet":{"spec":{"selector":{},"serviceName":"mas-mongo-ce-svc","template":{"spec":{"containers":[{"image":"quay.io/ibmmas/mongo@sha256:3e55e5012c20309e73ea1e78993d8d94f1f43e6c784ab51795a8e1f495c5de60","name":"mongod","resources":{"limits":{"cpu":"1","memory":"1Gi"},"requests":{"cpu":"500m","memory":"1Gi"}}}]}},"volumeClaimTemplates":[{"metadata":{"name":"data-volume"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"azure-disk-prm-lrs-csi"}},{"metadata":{"name":"logs-volume"},"spec":{"accessModes":["ReadWriteOnce"],"resources":{"requests":{"storage":"20Gi"}},"storageClassName":"azure-disk-prm-lrs-csi"}}]}},"type":"ReplicaSet","users":[{"db":"admin","name":"admin","passwordSecretRef":{"name":"mas-mongo-ce-admin-password"},"roles":[{"db":"admin","name":"clusterAdmin"},{"db":"admin","name":"userAdminAnyDatabase"},{"db":"admin","name":"dbOwner"},{"db":"admin","name":"readWriteAnyDatabase"}],"scramCredentialsSecretName":"mas-mongo-ce-scram"}],"version":"5.0.21"}} mongodb.com/v1.lastAppliedMongoDBVersion: 4.4.21 mongodb.com/v1.lastSuccessfulConfiguration: >- {"members":3,"type":"ReplicaSet","version":"4.4.21","arbiters":0,"security":{"authentication":{"modes":["SCRAM-SHA-256","SCRAM-SHA-1"],"ignoreUnknownUsers":true},"tls":{"enabled":true,"optional":false,"certificateKeySecretRef":{"name":"mongo-server-cert"},"caConfigMapRef":{"name":"mas-mongo-ce-cert-map"}}},"users":[{"name":"admin","db":"admin","passwordSecretRef":{"name":"mas-mongo-ce-admin-password","key":""},"roles":[{"db":"admin","name":"clusterAdmin"},{"db":"admin","name":"userAdminAnyDatabase"},{"db":"admin","name":"dbOwner"},{"db":"admin","name":"readWriteAnyDatabase"}],"scramCredentialsSecretName":"mas-mongo-ce-scram","connectionStringSecretName":""}],"statefulSet":{"spec":{}},"agent":{"logLevel":"","maxLogFileDurationHours":0},"additionalMongodConfig":{},"prometheus":{"username":"metrics-endpoint-user","passwordSecretRef":{"name":"mas-mongo-ce-metrics-endpoint-secret","key":""},"tlsSecretKeyRef":{"name":"","key":""}}} resourceVersion: '233840522' name: mas-mongo-ce uid: 83ca95ee-5500-47bb-8a75-725cec84997c creationTimestamp: '2023-12-07T18:49:33Z' generation: 2 managedFields:

andrercm commented 9 months ago

@puchakayalals would you please provide logs for the following:

  1. Mongo deployment status - to check if all pods are running:
    oc get pods -n mongoce

should return something as:

NAME                                           READY   STATUS    RESTARTS   AGE
mas-mongo-ce-0                                 2/2     Running   0          23h
mas-mongo-ce-1                                 2/2     Running   0          23h
mas-mongo-ce-2                                 2/2     Running   0          23h
mongodb-kubernetes-operator-5cd9b97dbb-tmkf5   1/1     Running   0          23h
  1. Mongo operator logs - to check if something is wrong with mongo operator:
    oc logs mongodb-kubernetes-operator-5cd9b97dbb-tmkf5
puchakayalals commented 9 months ago

@andrercm here is the required information:

  1. Mongo deployment status:
    oc get pods
    NAME                                           READY   STATUS    RESTARTS     AGE
    mas-mongo-ce-0                                 1/2     Running   1 (4s ago)   24s
    mas-mongo-ce-1                                 2/2     Running   0            11d
    mas-mongo-ce-2                                 2/2     Running   0            11d
    mongodb-kubernetes-operator-69d77d5798-srbkl   1/1     Running   0            11d
  2. Mongo Operator logs:
    2024-01-03T10:19:41.397Z    INFO    controllers/replica_set_controller.go:134   Reconciling MongoDB {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.397Z    DEBUG   controllers/replica_set_controller.go:136   Validating MongoDB.Spec {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.397Z    DEBUG   controllers/replica_set_controller.go:146   Ensuring the service exists {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.398Z    DEBUG   agent/agent_readiness.go:106    The Pod 'mas-mongo-ce-0' doesn't have annotation 'agent.mongodb.com/version' yet    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.398Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.398Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.398Z    DEBUG   agent/replica_set_port_manager.go:122   No port change required {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.407Z    INFO    controllers/replica_set_controller.go:472   Create/Update operation succeeded   {"ReplicaSet": "mongoce/mas-mongo-ce", "operation": "updated"}
    2024-01-03T10:19:41.407Z    INFO    controllers/mongodb_tls.go:43   Ensuring TLS is correctly configured    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.407Z    INFO    controllers/mongodb_tls.go:86   Successfully validated TLS config   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.407Z    INFO    controllers/replica_set_controller.go:297   TLS is enabled, creating/updating CA secret {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.413Z    INFO    controllers/replica_set_controller.go:301   TLS is enabled, creating/updating TLS secret    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.422Z    DEBUG   controllers/replica_set_controller.go:404   Enabling TLS on a deployment with a StatefulSet that is not Ready, the Automation Config must be updated first  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.422Z    INFO    controllers/replica_set_controller.go:364   Creating/Updating AutomationConfig  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.438Z    DEBUG   scram/scram.go:101  Credentials have not changed, using credentials stored in: secret/mas-mongo-ce-scram-scram-credentials
    2024-01-03T10:19:41.438Z    DEBUG   agent/agent_readiness.go:106    The Pod 'mas-mongo-ce-0' doesn't have annotation 'agent.mongodb.com/version' yet    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.438Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.438Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.438Z    DEBUG   agent/replica_set_port_manager.go:122   No port change required {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    DEBUG   agent/replica_set_port_manager.go:40    Calculated process port map: map[mas-mongo-ce-0:27017 mas-mongo-ce-1:27017 mas-mongo-ce-2:27017]    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    DEBUG   controllers/replica_set_controller.go:539   AutomationConfigMembersThisReconciliation   {"mdb.AutomationConfigMembersThisReconciliation()": 3}
    2024-01-03T10:19:41.439Z    DEBUG   controllers/replica_set_controller.go:387   Waiting for agents to reach version 13  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    DEBUG   agent/agent_readiness.go:106    The Pod 'mas-mongo-ce-0' doesn't have annotation 'agent.mongodb.com/version' yet    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:41.439Z    INFO    controllers/mongodb_status_options.go:110   ReplicaSet is not yet ready, retrying in 10 seconds
    2024-01-03T10:19:51.452Z    INFO    controllers/replica_set_controller.go:134   Reconciling MongoDB {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   controllers/replica_set_controller.go:136   Validating MongoDB.Spec {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   controllers/replica_set_controller.go:146   Ensuring the service exists {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-0' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.452Z    DEBUG   agent/replica_set_port_manager.go:122   No port change required {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.460Z    INFO    controllers/replica_set_controller.go:472   Create/Update operation succeeded   {"ReplicaSet": "mongoce/mas-mongo-ce", "operation": "updated"}
    2024-01-03T10:19:51.460Z    INFO    controllers/mongodb_tls.go:43   Ensuring TLS is correctly configured    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.460Z    INFO    controllers/mongodb_tls.go:86   Successfully validated TLS config   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.460Z    INFO    controllers/replica_set_controller.go:297   TLS is enabled, creating/updating CA secret {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.466Z    INFO    controllers/replica_set_controller.go:301   TLS is enabled, creating/updating TLS secret    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.476Z    DEBUG   controllers/replica_set_controller.go:404   Enabling TLS on a deployment with a StatefulSet that is not Ready, the Automation Config must be updated first  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.476Z    INFO    controllers/replica_set_controller.go:364   Creating/Updating AutomationConfig  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.496Z    DEBUG   scram/scram.go:101  Credentials have not changed, using credentials stored in: secret/mas-mongo-ce-scram-scram-credentials
    2024-01-03T10:19:51.496Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-0' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.496Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.496Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.497Z    DEBUG   agent/replica_set_port_manager.go:122   No port change required {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.497Z    DEBUG   agent/replica_set_port_manager.go:40    Calculated process port map: map[mas-mongo-ce-0:27017 mas-mongo-ce-1:27017 mas-mongo-ce-2:27017]    {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.497Z    DEBUG   controllers/replica_set_controller.go:539   AutomationConfigMembersThisReconciliation   {"mdb.AutomationConfigMembersThisReconciliation()": 3}
    2024-01-03T10:19:51.497Z    DEBUG   controllers/replica_set_controller.go:387   Waiting for agents to reach version 13  {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.497Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-0' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.498Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-1' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.498Z    DEBUG   agent/agent_readiness.go:110    The Agent in the Pod 'mas-mongo-ce-2' hasn't reached the goal state yet (goal: 13, agent: -1)   {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:19:51.498Z    INFO    controllers/mongodb_status_options.go:110   ReplicaSet is not yet ready, retrying in 10 seconds
    2024-01-03T10:20:01.520Z    INFO    controllers/replica_set_controller.go:134   Reconciling MongoDB {"ReplicaSet": "mongoce/mas-mongo-ce"}
    2024-01-03T10:20:01.520Z    DEBUG   controllers/replica_set_controller.go:136   Validating MongoDB.Spec {"ReplicaSet": 
andrercm commented 9 months ago

@puchakayalals mongo operator logs does not show obvious errors... but what i noticed was that one of the mongo pods got recently restarted? i see:

mas-mongo-ce-0                                 1/2     Running   1 (4s ago)   24s

while the other two seem healthy for quite a while:

mas-mongo-ce-1                                 2/2     Running   0            11d
mas-mongo-ce-2                                 2/2     Running   0            11d

can you check if there's any logs or events coming from the mas-mongo-ce-0 that might be preventing it from running successfully? we expect to see 2/2 containers ready for all these 3 mongo-ce pods.

puchakayalals commented 9 months ago

@andrercm i see this error in the events of mas-mongo-ce-0

oc get events | grep mas-mongo-ce-0

99s         Normal    Pulled                   pod/mas-mongo-ce-0         Successfully pulled image "quay.io/mongodb/mongodb-agent@sha256:00dc01552af1eac8a457d7214ae4f913dd2a14409c0125dead25808e40cc0d62" in 675.409798ms (675.429398ms including waiting)
99s         Normal    Created                  pod/mas-mongo-ce-0         Created container mongodb-agent
99s         Normal    Started                  pod/mas-mongo-ce-0         Started container mongodb-agent
86s         Warning   Unhealthy                pod/mas-mongo-ce-0         Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory...
71s         Warning   Unhealthy                pod/mas-mongo-ce-0         Readiness probe failed:
71s         Warning   BackOff                  pod/mas-mongo-ce-0         Back-off restarting failed container
21s         Normal    Scheduled                pod/mas-mongo-ce-0         Successfully assigned mongoce/mas-mongo-ce-0 to ewamaro-poc-7m4jc-worker-centralus1-zjfkq
12s         Normal    AddedInterface           pod/mas-mongo-ce-0         Add eth0 [10.181.3.217/23] from openshift-sdn
12s         Normal    Pulling                  pod/mas-mongo-ce-0         Pulling image "quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook@sha256:5c1483fa22cfb772f186a30180fcfa4af91ff3a638b28a37cc9f997f8ac046f9"
11s         Normal    Pulled                   pod/mas-mongo-ce-0         Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook@sha256:5c1483fa22cfb772f186a30180fcfa4af91ff3a638b28a37cc9f997f8ac046f9" in 709.085133ms (709.102834ms including waiting)
11s         Normal    Created                  pod/mas-mongo-ce-0         Created container mongod-posthook
11s         Normal    Started                  pod/mas-mongo-ce-0         Started container mongod-posthook
11s         Normal    Pulling                  pod/mas-mongo-ce-0         Pulling image "quay.io/mongodb/mongodb-kubernetes-readinessprobe@sha256:419924de6a1bee566a4fb92656c6be79b0adf36a875a2845699f64e3b536186e"
10s         Normal    Pulled                   pod/mas-mongo-ce-0         Successfully pulled image "quay.io/mongodb/mongodb-kubernetes-readinessprobe@sha256:419924de6a1bee566a4fb92656c6be79b0adf36a875a2845699f64e3b536186e" in 676.129955ms (676.148156ms including waiting)
10s         Normal    Created                  pod/mas-mongo-ce-0         Created container mongodb-agent-readinessprobe
10s         Normal    Started                  pod/mas-mongo-ce-0         Started container mongodb-agent-readinessprobe
10s         Normal    Pulled                   pod/mas-mongo-ce-0         Container image "quay.io/ibmmas/mongo@sha256:e1b43604ed1b54804f15c421de666e8be6c6d1ebf57e5825dff22493f9bd5f1e" already present on machine
10s         Normal    Created                  pod/mas-mongo-ce-0         Created container mongod
10s         Normal    Started                  pod/mas-mongo-ce-0         Started container mongod
10s         Normal    Pulling                  pod/mas-mongo-ce-0         Pulling image "quay.io/mongodb/mongodb-agent@sha256:00dc01552af1eac8a457d7214ae4f913dd2a14409c0125dead25808e40cc0d62"
9s          Normal    Pulled                   pod/mas-mongo-ce-0         Successfully pulled image "quay.io/mongodb/mongodb-agent@sha256:00dc01552af1eac8a457d7214ae4f913dd2a14409c0125dead25808e40cc0d62" in 717.819264ms (717.835964ms including waiting)
9s          Normal    Created                  pod/mas-mongo-ce-0         Created container mongodb-agent
9s          Normal    Started                  pod/mas-mongo-ce-0         Started container mongodb-agent
4s          Warning   Unhealthy                pod/mas-mongo-ce-0         Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory...

And this is what i see in the logs of mongod container of mas-mongo-ce-0: mas-mongo-ce-0-mongod.log

2024-01-03T21:19:27.680Z    INFO    versionhook/main.go:32  Running version change post-start hook
2024-01-03T21:19:27.680Z    INFO    versionhook/main.go:39  Waiting for agent health status...
2024-01-03T21:19:28.683Z    INFO    versionhook/main.go:59  Pod should be deleted
I0103 21:19:29.885659       7 request.go:677] Waited for 1.181975052s due to client-side throttling, not priority and fairness, request: GET:https://10.182.0.1:443/apis/console.openshift.io/v1alpha1?timeout=32s
2024-01-03T21:19:34.136Z    INFO    versionhook/main.go:72  Pod killed itself, waiting...

Log from mongodb-agent container of mas-mongo-ce-0:

cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory
[2024-01-03T21:20:22.265+0000] [.debug] [util/distros/distros.go:LinuxFlavorAndVersionUncached:142] Detected linux flavor ubuntu version 20.4
andrercm commented 9 months ago

@puchakayalals have you tried deleting mongo-ce-0 and mongo operator pods and waiting to see if that self heals? It's interesting that the other two replicas seem to be running fine but this one mongo-ce-0...

@sanju7216 @terenceq any idea what might be happening?

puchakayalals commented 9 months ago

@andrercm Yes, i tried oc delete pods --all -n mongoce but the behavior with mas-mongo-ce-0 pod is still same. Weird thing is pod is self terminating after two restarts, it is basically trying to self heal but seems not working

andrercm commented 9 months ago

@puchakayalals Maybe you're facing same issue as another one that got recently closed: https://github.com/ibm-mas/ansible-devops/issues/1134

Not really sure though if that's the problem but from the logs, it seems that the mongodb-kubernetes-operator is not using the expected image digest:

"Current Mongo operator image ......... quay.io/ibmmas/mongodb-kubernetes-operator@sha256:2d5339dab49c3b9523ca97b53580db0f3a291a671a60f5882c5ed18a66073520

For MongoDB operator in 5.0.21, I'd expect to see quay.io/ibmmas/mongodb-kubernetes-operator@sha256:de50b6f6b56bd25a99b9148b2d82be427799ec304843b552eca544c56379de00 being used, which comes from here: https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/roles/mirror_extras_prepare/vars/mongoce_5.0.21.yml#L6

How about trying to use the December catalog since its now been available? That should upgrade you to MongoDB 5.0.23: https://github.com/ibm-mas/ansible-devops/blob/master/ibm/mas_devops/common_vars/casebundles/v8-231228-amd64.yml#L79

puchakayalals commented 9 months ago

@andrercm After applying December catalog, i see mongoDBCommunity CRD instance status change to 5.0.21 and pending. While the monog-ce-0 pod is still self-terminating. Deleting the operator pod and all statefulset pods 3 times did finally upgrade mongoDB to 5.0.21 and statefulsets up and running.

Thank you for the support and help.