mongodb / mongodb-kubernetes-operator

MongoDB Community Kubernetes Operator
Other
1.23k stars 504 forks source link

Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory #1527

Open prachiwaghulkar opened 6 months ago

prachiwaghulkar commented 6 months ago

What did you do to encounter the bug? Applied the mongodb CR using MongoDB image 5.0.26. mongodb pod is in CrashLoopBackOff and mongodbcommunity is in Pending state.

mongodb-kubernetes-operator-54c9d54fbc-mch6k          1/1     Running            0             8m49s
staging-mongodb-0                                     0/2     CrashLoopBackOff   1 (3s ago)    26s
prachiwaghulkar@Prachis-MacBook-Pro ~ % oc get mongodbcommunity
NAME              PHASE     VERSION
staging-mongodb   Pending   

Pod logs give the following error:

oc logs -p staging-mongodb-0
Defaulted container "mongod" out of: mongod, mongodb-agent, mongod-posthook (init), mongodb-agent-readinessprobe (init)
exec /bin/sh: exec format error

Describe on the pod gives below error in events:

Warning  BackOff                 21s (x2 over 22s)  kubelet                  Back-off restarting failed container
  Warning  Unhealthy               15s                kubelet                  Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory

goroutine 1 [running]:
main.main()
           /workspace/cmd/readiness/main.go:226 +0x191

What did you expect? /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json should exist and this error should not come.

What happened instead? /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json file doesn't exist and the error is thrown. mongodb pod is in crashloopbackoff.

Operator Information

If possible, please include:

laiminhtrung1997 commented 6 months ago
  1. You need to configure the field users in the mongodbcommunity custom resource.
  2. You need to modify the readinessProbe.initialDelaySeconds to 10 of container mongd.
prachiwaghulkar commented 6 months ago

@laiminhtrung1997 Unfortunately, the readinessProbe still fails and pod goes in CrashLoopBackOff. Have provided readinessProbe.initialDelaySeconds as 10 to mongod container. users field was already configured in the mongodbcommunity CR.

Normal   Created         6m40s (x3 over 6m54s)  kubelet            Created container mongod
  Warning  Unhealthy       6m40s (x2 over 6m40s)  kubelet            Readiness probe failed:
  Normal   Started         6m39s (x3 over 6m54s)  kubelet            Started container mongod
  Warning  BackOff         111s (x25 over 6m51s)  kubelet            Back-off restarting failed container
laiminhtrung1997 commented 6 months ago

Dear @prachiwaghulkar Could you please provide the manifest of your mdbc cr?

prachiwaghulkar commented 6 months ago

@laiminhtrung1997 PFB the mdbc cr manifest.

apiVersion: v1
items:
- apiVersion: mongodbcommunity.mongodb.com/v1
  kind: MongoDBCommunity
  metadata:
    name: staging-mongodb
    namespace: staging
  spec:
    additionalMongodConfig:
      net.maxIncomingConnections: 900
    featureCompatibilityVersion: "5.0"
    members: 1
    security:
      authentication:
        ignoreUnknownUsers: true
        modes:
        - SCRAM
      tls:
        caConfigMapRef:
          name: staging-mongodb-cert-ca-cm
        certificateKeySecretRef:
          name: staging-mongodb-cert
        enabled: true
    statefulSet:
      spec:
        template:
          spec:
            containers:
            - image: docker-na-public.artifactory.swg-devops.com/sec-guardium-next-gen-docker-local/mongo:5.0.26
              name: mongod
              readinessProbe:
                initialDelaySeconds: 10
              resources:
                limits:
                  cpu: "4"
                  ephemeral-storage: 5Gi
                  memory: 10Gi
                requests:
                  cpu: "1"
                  ephemeral-storage: 1Gi
                  memory: 2Gi
            imagePullSecrets:
            - name: ibm-entitlement-key
            initContainers:
            - name: mongodb-agent-readinessprobe
              resources:
                limits:
                  cpu: 100m
                  memory: 500Mi
                requests:
                  cpu: 6m
                  memory: 6Mi
            - name: mongod-posthook
              resources:
                limits:
                  cpu: 100m
                  memory: 500Mi
                requests:
                  cpu: 6m
                  memory: 6Mi
        volumeClaimTemplates:
        - apiVersion: v1
          kind: PersistentVolumeClaim
          metadata:
            name: data-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: rook-ceph-block
            volumeMode: Filesystem
        - apiVersion: v1
          kind: PersistentVolumeClaim
          metadata:
            name: logs-volume
          spec:
            accessModes:
            - ReadWriteOnce
            resources:
              requests:
                storage: 100Gi
            storageClassName: rook-ceph-block
            volumeMode: Filesystem
    type: ReplicaSet
    users:
    - db: admin
      name: root
      passwordSecretRef:
        key: mongodbRootPassword
        name: ibm-mongodb-authsecret
      roles:
      - db: admin
        name: clusterAdmin
      - db: admin
        name: userAdminAnyDatabase
      - db: admin
        name: readWriteAnyDatabase
      scramCredentialsSecretName: root-scram2
    - db: tnt_mbr_meta
      name: metadata
      passwordSecretRef:
        key: mongodbMetadataPassword
        name: ibm-mongodb-authsecret
      roles:
      - db: tnt_mbr_meta
        name: dbOwner
      scramCredentialsSecretName: metadata-scram2
    version: 5.0.26
laiminhtrung1997 commented 6 months ago

The log of container mongodb-agent in mongodb-0 too, please.

prachiwaghulkar commented 6 months ago

The log of mongodb-agent:

prachiwaghulkar@Prachis-MacBook-Pro cert-request % oc logs pod/staging-mongodb-0 -c mongodb-agent 
cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory
[2024-04-19T05:31:54.604+0000] [.debug] [util/distros/distros.go:LinuxFlavorAndVersionUncached:144] Detected linux flavor ubuntu version 20.4
laiminhtrung1997 commented 6 months ago

Hmmmm. My mdbc does not configure the TLS, and the MongoDB started without any errors. I have no idea. Sorry for cannot help you.

prachiwaghulkar commented 6 months ago

@irajdeep Can anybody from the community take a look and be able to assist here? It is important for us to move to 5.0.26

nammn commented 6 months ago

@prachiwaghulkar can you please provide the agent logs and health logs as described here? https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/.github/ISSUE_TEMPLATE/bug_report.md

Having said that exec /bin/sh: exec format error seems like an architecture error. Are you running arm on amd or amd on arm? I suggest to change it to either and test it again.

prachiwaghulkar commented 6 months ago

@nammn I have used the following image: sha256:0172fb2a286d3dc9823f0e377587c0a545022bd330c817ed6b8bc231ea0643ad which is linux/amd64. We are updating from 5.0.24 to 5.0.26. 5.0.24 with amd worked fine for us.

PFB the logs:

Agent logs:

(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent.log        
[2024-04-22T06:23:30.847+0000] [header.info] [::0]        GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:23:30.847+0000] [header.info] [::0]  AutomationVersion = 107.0.0.8465
[2024-04-22T06:23:30.847+0000] [header.info] [::0]          localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:23:30.847+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:23:30.847+0000] [header.info] [::0]     PlanCutoffTime = 300000
[2024-04-22T06:23:30.847+0000] [header.info] [::0]       TracePlanner = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0]               User = 2000
[2024-04-22T06:23:30.847+0000] [header.info] [::0]         Go version = go1.20.10
[2024-04-22T06:23:30.847+0000] [header.info] [::0]         MmsBaseURL = 
[2024-04-22T06:23:30.847+0000] [header.info] [::0]         MmsGroupId = 
[2024-04-22T06:23:30.847+0000] [header.info] [::0]          HttpProxy = 
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0]        HttpsCAFile = 
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsMMSServerClientCertificate = 
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:23:30.847+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:23:30.846+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [06:23:30.846] Failed to planAndExecute : <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:50:56.215+0000] [header.info] [::0]        GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:50:56.215+0000] [header.info] [::0]  AutomationVersion = 107.0.0.8465
[2024-04-22T06:50:56.215+0000] [header.info] [::0]          localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:50:56.215+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:50:56.215+0000] [header.info] [::0]     PlanCutoffTime = 300000
[2024-04-22T06:50:56.215+0000] [header.info] [::0]       TracePlanner = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0]               User = 2000
[2024-04-22T06:50:56.215+0000] [header.info] [::0]         Go version = go1.20.10
[2024-04-22T06:50:56.215+0000] [header.info] [::0]         MmsBaseURL = 
[2024-04-22T06:50:56.215+0000] [header.info] [::0]         MmsGroupId = 
[2024-04-22T06:50:56.215+0000] [header.info] [::0]          HttpProxy = 
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0]        HttpsCAFile = 
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsMMSServerClientCertificate = 
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:50:56.215+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:50:56.253+0000] [.error] [main/components/agent.go:ApplyClusterConfig:358] [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T06:50:56.254+0000] [.error] [src/main/cm.go:mainLoop:520] [06:50:56.254] Error applying desired cluster configs : [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:22:36.561] Failed to planAndExecute : <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:54:17.873] Failed to planAndExecute : <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s

Health logs:

(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
{"statuses":{"staging-mongodb-0":{"IsInGoalState":false,"LastMongoUpTime":0,"ExpectedToBeUp":true,"ReplicationStatus":-1}},"mmsStatus":{"staging-mongodb-0":{"name":"staging-mongodb-0","lastGoalVersionAchieved":-1,"plans":[{"automationConfigVersion":1,"started":"2024-04-22T06:50:56.381946486Z","completed":null,"moves":[{"move":"Start","moveDoc":"Start the process","steps":[{"step":"StartFresh","stepDoc":"Start a mongo instance  (start fresh)","isWaitStep":false,"started":"2024-04-22T06:50:56.381976336Z","completed":null,"result":"error"}]},{"move":"WaitAllRsMembersUp","moveDoc":"Wait until all members of this process' repl set are up","steps":[{"step":"WaitAllRsMembersUp","stepDoc":"Wait until all members of this process' repl set are up","isWaitStep":true,"started":null,"completed":null,"result":""}]},{"move":"RsInit","moveDoc":"Initialize a replica set including the current MongoDB process","steps":[{"step":"RsInit","stepDoc":"Initialize a replica set","isWaitStep":false,"started":null,"completed":null,"result":""}]},{"move":"WaitFeatureCompatibilityVersionCorrect","moveDoc":"Wait for featureCompatibilityVersion to be right","steps":[{"step":"WaitFeatureCompatibilityVersionCorrect","stepDoc":"Wait for featureCompatibilityVersion to be right","isWaitStep":true,"started":null,"completed":null,"result":""}]}]}],"errorCode":0,"errorString":""}}}%  
prachiwaghulkar commented 6 months ago

@nammn Were you able to check the issue?

FYI, these are the mongo-agent, readinessprobe image that I am using.

     - image: mongodb/mongodb-agent
        mediaType: application/vnd.docker.distribution.manifest.v2
        digest: sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3
        tag: 107.0.0.8465-1
        platform:
          architecture: amd64
          os: linux
        registries:
          - host: quay.io
      - image: mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook
        mediaType: application/vnd.docker.distribution.manifest.v2
        digest: sha256:08495e1331a1691878e449d971129ed17858a20a7b69bb74d2e84f057cfcc098
        tag: 1.0.8
        platform:
          architecture: amd64
          os: linux
        registries:
          - host: quay.io
      - image: mongodb/mongodb-kubernetes-operator
        mediaType: application/vnd.docker.distribution.manifest.v2
        digest: sha256:0aa26010be99caaf8a7dfd9cba81e326261ed99a69ac68b54aa8af3a104970bc
        tag: 0.9.0
        platform:
          architecture: amd64
          os: linux
        registries:
          - host: quay.io
      - image: mongodb/mongodb-kubernetes-readinessprobe
        mediaType: application/vnd.docker.distribution.manifest.v2
        digest: sha256:e84438c5394be7223de27478eb9066204d62e6ecd233d3d4e4c11d3da486a7b5
        tag: 1.0.17
        platform:
          architecture: amd64
          os: linux
        registries:
          - host: quay.io
prachiwaghulkar commented 6 months ago

@irajdeep @nammn Can you or somebody else take a look and assist here please? the mongodb worked fine for us till 5.0.24. I checked with mongodb 5.0.25 today and it errored with the same logs that I have shared above. So in short, we have been encountering this issue since the release 5.0.25!!

sebt3 commented 6 months ago

Having the exact same issue here. Fresh new instance.

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  creationTimestamp: "2024-05-13T09:11:50Z"
  generation: 1
  name: wildduck-wildduck-mongo
  namespace: solidite-mail
  resourceVersion: "192255239"
  uid: 7cfab3a7-30ac-434a-b65d-31b638229bde
spec:
  additionalMongodConfig:
    storage.wiredTiger.engineConfig.cacheSizeGB: 1
  members: 1
  security:
    authentication:
      ignoreUnknownUsers: true
      modes:
      - SCRAM
  statefulSet:
    spec:
      template:
        metadata:
          annotations:
            k8up.io/backupcommand: sh -c 'mongodump --username=$MONGODB_USER --password=$MONGODB_PASSWORD
              mongodb://localhost/$MONGODB_NAME --archive'
            k8up.io/file-extension: .archive
        spec:
          containers:
          - env:
            - name: MONGODB_NAME
              value: wildduck
            - name: MONGODB_USER
              value: wildduck
            - name: MONGODB_PASSWORD
              valueFrom:
                secretKeyRef:
                  key: password
                  name: wildduck-wildduck-mongo
            imagePullPolicy: IfNotPresent
            name: mongod
            resources:
              limits:
                cpu: "1"
                memory: 1100M
              requests:
                cpu: "0.3"
                memory: 400M
  type: ReplicaSet
  users:
  - db: wildduck
    name: wildduck
    passwordSecretRef:
      name: wildduck-wildduck-mongo
    roles:
    - db: wildduck
      name: readWrite
    scramCredentialsSecretName: wildduck-wildduck-mongo-scram
  version: 6.0.13
status:
  currentMongoDBMembers: 0
  currentStatefulSetReplicas: 0
  message: ReplicaSet is not yet ready, retrying in 10 seconds
  mongoUri: ""
  phase: Pending

Describing the pod show the following errors :

  Warning  Unhealthy  21m (x3 over 21m)  kubelet            Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory

goroutine 1 [running]:
main.main()
           /workspace/cmd/readiness/main.go:226 +0x191
  Warning  BackOff    11m (x46 over 21m)   kubelet  Back-off restarting failed container mongod in pod wildduck-wildduck-mongo-0_solidite-mail(ba2270f0-ecf0-4468-b01f-b7a5df538b4b)
  Warning  Unhealthy  76s (x223 over 21m)  kubelet  Readiness probe failed:

The pod logs contains nothing revelent

nammn commented 4 months ago

@prachiwaghulkar can you verify that the mongodb image you are using is indeed compatible and working? Looking at the agent log it seems that the agent seems to wait forever and mongod and the related service is not up and running.;

can you somehow get a debug container running trying to access that service? I

staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017
saksham1gupta commented 4 months ago

Facing same issue mongodb instance readiness probe failing for mongodb-agent container MongoDB Community operator Version: community-operator-0.9.0 Openshift version: 4.14.25

apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-devops-test
  namespace: di-devops
spec:
  additionalConnectionStringConfig:
    readPreference: primary
  additionalMongodConfig:
    storage.wiredTiger.engineConfig.journalCompressor: zlib
  members: 3
  security:
    authentication:
      ignoreUnknownUsers: true
      modes:
        - SCRAM
  statefulSet:
    spec:
      selector:
        matchLabels:
          app.kubernetes.io/name: mongodb
      template:
        metadata:
          labels:
            app.kubernetes.io/name: mongodb
        spec:
          affinity:
            podAntiAffinity:
              preferredDuringSchedulingIgnoredDuringExecution:
                - podAffinityTerm:
                    labelSelector:
                      matchExpressions:
                        - key: app.kubernetes.io/name
                          operator: In
                          values:
                            - mongodb
                    topologyKey: kubernetes.io/hostname
                  weight: 100
          containers:
            - name: mongod
              resources:
                limits:
                  cpu: '0.2'
                  memory: 250M
                requests:
                  cpu: '0.2'
                  memory: 200M
            - name: mongodb-agent
              readinessProbe:
                failureThreshold: 40
                initialDelaySeconds: 5
                timeout: 30
              resources:
                limits:
                  cpu: '0.2'
                  memory: 250M
                requests:
                  cpu: '0.2'
                  memory: 200M
          initContainers:
            - name: mongodb-agent-readinessprobe
              resources:
                limits:
                  cpu: '2'
                  memory: 200M
                requests:
                  cpu: '1'
                  memory: 100M
  type: ReplicaSet
  users:
    - additionalConnectionStringConfig:
        readPreference: secondary
      db: didevops
      name: didevops
      passwordSecretRef:
        name: my-user-password
      roles:
        - db: didevops
          name: clusterAdmin
        - db: didevops
          name: userAdminAnyDatabase
        - db: didveops
          name: readWriteAnyDatabase
      scramCredentialsSecretName: my-scram
  version: 6.0.5
status:
  currentMongoDBMembers: 3
  currentStatefulSetReplicas: 3
  message: 'ReplicaSet is not yet ready, retrying in 10 seconds'
  mongoUri: 'mongodb://mongodb-devops-test-0.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-1.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-2.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017/?replicaSet=mongodb-devops-test&readPreference=primary'
  phase: Pending
  version: 6.0.5
veebkolm commented 3 months ago

Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.

saksham1gupta commented 3 months ago

Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.

How can I ensure that node has correct CPU model available in openshift pod, is there any docs available or command which can help it supports?

shubham-cmyk commented 2 months ago

Any update on this I am facing the same issue

veebkolm commented 2 months ago

In my case we had to pass host CPU model from Proxmox. Cloud providers should already pass the correct model. lscpu | grep avx will show you whether your CPU supports AVX or not.

CloudFocused commented 1 month ago

For those on this thread, @veebkolm was absolutely right that it was a CPU flag for me. I am working Proxmox. To fix this I shelled into the host and had to pass the CPU features through to Kubernetes Nodes.

TO RESOLVE (For me): cd /etc/pve/qemu-server/ nano .conf edit the cpu line to be cpu:host save restart the VM.

Now all is right in the mongo world.