Open prachiwaghulkar opened 6 months ago
users
in the mongodbcommunity custom resource.readinessProbe.initialDelaySeconds
to 10
of container mongd.@laiminhtrung1997 Unfortunately, the readinessProbe still fails and pod goes in CrashLoopBackOff. Have provided readinessProbe.initialDelaySeconds
as 10 to mongod container. users
field was already configured in the mongodbcommunity CR.
Normal Created 6m40s (x3 over 6m54s) kubelet Created container mongod
Warning Unhealthy 6m40s (x2 over 6m40s) kubelet Readiness probe failed:
Normal Started 6m39s (x3 over 6m54s) kubelet Started container mongod
Warning BackOff 111s (x25 over 6m51s) kubelet Back-off restarting failed container
Dear @prachiwaghulkar Could you please provide the manifest of your mdbc cr?
@laiminhtrung1997 PFB the mdbc cr manifest.
apiVersion: v1
items:
- apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: staging-mongodb
namespace: staging
spec:
additionalMongodConfig:
net.maxIncomingConnections: 900
featureCompatibilityVersion: "5.0"
members: 1
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
tls:
caConfigMapRef:
name: staging-mongodb-cert-ca-cm
certificateKeySecretRef:
name: staging-mongodb-cert
enabled: true
statefulSet:
spec:
template:
spec:
containers:
- image: docker-na-public.artifactory.swg-devops.com/sec-guardium-next-gen-docker-local/mongo:5.0.26
name: mongod
readinessProbe:
initialDelaySeconds: 10
resources:
limits:
cpu: "4"
ephemeral-storage: 5Gi
memory: 10Gi
requests:
cpu: "1"
ephemeral-storage: 1Gi
memory: 2Gi
imagePullSecrets:
- name: ibm-entitlement-key
initContainers:
- name: mongodb-agent-readinessprobe
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 6m
memory: 6Mi
- name: mongod-posthook
resources:
limits:
cpu: 100m
memory: 500Mi
requests:
cpu: 6m
memory: 6Mi
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: logs-volume
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: rook-ceph-block
volumeMode: Filesystem
type: ReplicaSet
users:
- db: admin
name: root
passwordSecretRef:
key: mongodbRootPassword
name: ibm-mongodb-authsecret
roles:
- db: admin
name: clusterAdmin
- db: admin
name: userAdminAnyDatabase
- db: admin
name: readWriteAnyDatabase
scramCredentialsSecretName: root-scram2
- db: tnt_mbr_meta
name: metadata
passwordSecretRef:
key: mongodbMetadataPassword
name: ibm-mongodb-authsecret
roles:
- db: tnt_mbr_meta
name: dbOwner
scramCredentialsSecretName: metadata-scram2
version: 5.0.26
The log of container mongodb-agent in mongodb-0 too, please.
The log of mongodb-agent:
prachiwaghulkar@Prachis-MacBook-Pro cert-request % oc logs pod/staging-mongodb-0 -c mongodb-agent
cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory
[2024-04-19T05:31:54.604+0000] [.debug] [util/distros/distros.go:LinuxFlavorAndVersionUncached:144] Detected linux flavor ubuntu version 20.4
Hmmmm. My mdbc does not configure the TLS, and the MongoDB started without any errors. I have no idea. Sorry for cannot help you.
@irajdeep Can anybody from the community take a look and be able to assist here? It is important for us to move to 5.0.26
@prachiwaghulkar can you please provide the agent logs and health logs as described here? https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/.github/ISSUE_TEMPLATE/bug_report.md
Having said that exec /bin/sh: exec format error
seems like an architecture error. Are you running arm on amd or amd on arm? I suggest to change it to either and test it again.
@nammn I have used the following image: sha256:0172fb2a286d3dc9823f0e377587c0a545022bd330c817ed6b8bc231ea0643ad which is linux/amd64. We are updating from 5.0.24 to 5.0.26. 5.0.24 with amd worked fine for us.
PFB the logs:
Agent logs:
(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/automation-agent.log
[2024-04-22T06:23:30.847+0000] [header.info] [::0] GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:23:30.847+0000] [header.info] [::0] AutomationVersion = 107.0.0.8465
[2024-04-22T06:23:30.847+0000] [header.info] [::0] localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:23:30.847+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:23:30.847+0000] [header.info] [::0] PlanCutoffTime = 300000
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TracePlanner = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] User = 2000
[2024-04-22T06:23:30.847+0000] [header.info] [::0] Go version = go1.20.10
[2024-04-22T06:23:30.847+0000] [header.info] [::0] MmsBaseURL =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] MmsGroupId =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] HttpProxy =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] HttpsCAFile =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:23:30.847+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:23:30.847+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:23:30.847+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:23:30.847+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:23:30.846+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:23:30.846+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [06:23:30.846] Failed to planAndExecute : <staging-mongodb-0> [06:23:30.846] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [06:23:30.846] Failed to apply action. Result = <nil> : <staging-mongodb-0> [06:23:30.846] Error sleeping until process was up : <staging-mongodb-0> [06:23:30.846] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] GitCommitId = 956e3386ad456471db1776d79637a38f182a6088
[2024-04-22T06:50:56.215+0000] [header.info] [::0] AutomationVersion = 107.0.0.8465
[2024-04-22T06:50:56.215+0000] [header.info] [::0] localhost = staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local
[2024-04-22T06:50:56.215+0000] [header.info] [::0] ErrorStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] GoalStateSleepTime = 10s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] NotGoalStateSleepTime = 1s
[2024-04-22T06:50:56.215+0000] [header.info] [::0] PlanCutoffTime = 300000
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TracePlanner = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] User = 2000
[2024-04-22T06:50:56.215+0000] [header.info] [::0] Go version = go1.20.10
[2024-04-22T06:50:56.215+0000] [header.info] [::0] MmsBaseURL =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] MmsGroupId =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] HttpProxy =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisableHttpKeepAlive = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] HttpsCAFile =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsRequireValidMMSServerCertificates = true
[2024-04-22T06:50:56.215+0000] [header.info] [::0] TlsMMSServerClientCertificate =
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KMIPProxyCertificateDir = /tmp
[2024-04-22T06:50:56.215+0000] [header.info] [::0] EnableLocalConfigurationServer = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DialTimeoutSeconds = 40
[2024-04-22T06:50:56.215+0000] [header.info] [::0] KeepUnusedMongodbVersions = false
[2024-04-22T06:50:56.215+0000] [header.info] [::0] DisallowDowngrades = false
[2024-04-22T06:50:56.253+0000] [.error] [main/components/agent.go:ApplyClusterConfig:358] [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T06:50:56.254+0000] [.error] [src/main/cm.go:mainLoop:520] [06:50:56.254] Error applying desired cluster configs : [06:50:56.253] Log path absent for process=state.ProcessConfigName=staging-mongodb-0ProcessType=mongodVersion=5.0.26FullVersion={"trueName":"5.0.26","gitVersion":"","modules":[],"major":5,"minor":0,"patch":26}Disabled=falseManualMode=falseNumCores=0CpuAffinity=[]LogRotate={"sizeThresholdMB":0,"timeThresholdHrs":0,"numUncompressed":0,"numTotal":0,"percentOfDiskspace":0,"includeAuditLogsWithMongoDBLogs":false}AuditLogRotate=<nil>LastResync="0001-01-01T00:00:00Z"LastThirdPartyRestoreResync="0001-01-01T00:00:00Z"LastCompact="0001-01-01T00:00:00Z"LastKmipMasterKeyRotation="0001-01-01T00:00:00Z"LastRestart="0001-01-01T00:00:00Z"Hostname=staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.localAlias=Cluster=AuthSchemaVersion=5FeatureCompatibilityVersion=5.0Kerberos=<nil>Args={"net":{"bindIp":"0.0.0.0","maxIncomingConnections":900,"port":27017,"tls":{"CAFile":"/var/lib/tls/ca/4971db0032afff31ab1235e283ef9ab7c9a4a483d630427923d253a41152cf13.pem","allowConnectionsWithoutCertificates":true,"certificateKeyFile":"/var/lib/tls/server/f131542c6e26217a9f960431d5177cd904c1a5661fd08482f4a194e836baa228.pem","mode":"requireTLS"}},"replication":{"replSetName":"staging-mongodb"},"setParameter":{"authenticationMechanisms":"SCRAM-SHA-256"},"storage":{"dbPath":"/data"}}ProcessAuthInfo={"UsersWanted":[{"user":"root","db":"admin","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"ztQzG8EXxgOT8qSloG6LfA==","storedKey":"+p4exhjiiYZoOIHahzi414ZINBs=","serverKey":"iV4qydBHQksSjyzTXDidhvn/9iY="},"scramSha256Creds":{"iterationCount":15000,"salt":"CcGExKTDjHefywe7CtG1VdfnqA9clT12VRz6MA==","storedKey":"JLVuWDSmdtJNNXRNVKf6Jw7MsofcbJP9G0N03N66Yb0=","serverKey":"d8R15D/XS9YVXwwDb6NjHBMCoYIrIxeUYU7PAK8tw7k="},"roles":[{"role":"clusterAdmin","db":"admin","minFcv":""},{"role":"readWriteAnyDatabase","db":"admin","minFcv":""},{"role":"userAdminAnyDatabase","db":"admin","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null},{"user":"metadata","db":"tnt_mbr_meta","authenticationRestrictions":[],"scramSha1Creds":{"iterationCount":10000,"salt":"iTSe6nHUP2rYsv8XvRgnnA==","storedKey":"sh4Q4pq/+EnduxDhyLEaY6bix3Y=","serverKey":"sL7I88TKpiWOJcD1X2MJHxBIIAg="},"scramSha256Creds":{"iterationCount":15000,"salt":"1zfBNBYr0OXWlPpMdZsoark+HcMfxoX0MltBpQ==","storedKey":"uBZBpVzBawhgY1wp8p52UlTzAtpkOc3UEgKC7JGPwbU=","serverKey":"EacvAm/pNKMyUobWrb0aL8+Og3BJ/W174YVhLMn8SWU="},"roles":[{"role":"dbOwner","db":"tnt_mbr_meta","minFcv":""}],"inheritedRoles":null,"mechanisms":[],"scope":null}],"UsersDeleted":null,"Roles":null,"DesiredKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredNewKey":"[ZpnXOsjiRNY-REDACTED-G04AKp5vLX0]","DesiredKeyHash":"KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho=","DesiredNewKeyHash":null,"KeyfileHashes":["KU8dVQoozhHdkGTMBh4UjQbqYTRFiyc9/juP3AbNnho="],"UsingAuth":true}IsConfigServer=falseIsShardServer=falseIsInReplSet=trueIsStandalone=falseIsArbiter=falseDownloadBase=FullySyncRsTags=falseReplicaSetId=staging-mongodbBackupRestoreUrl=<redacted>, BackupRestoreUrlV3=BackupParallelRestoreUrl=BackupParallelRestoreNumChunks=0BackupParallelRestoreNumWorkers=0BackupThirdPartyRestoreBaseUrl=BackupRestoreRsVersion=0BackupRestoreElectionTerm=0BackupRestoreCheckpointTimestamp=<nil>BackupRestoreCertificateValidationHostname=BackupRestoreSystemUsersUUID=BackupRestoreSystemRolesUUID=BackupRestoreBalancerSettings=nullBackupRestoreConfigSettingsUUID=BackupShardIdRestoreMaps=[]DirectAttachVerificationKey=DirectAttachSourceClusterName=DirectAttachShouldFilterByFileList=falseConfigPath=StorageEngine=BackupRestoreOplogBaseUrl=BackupRestoreOplog=<nil>BackupRestoreDesiredTime=<nil>BackupRestoreSourceRsId=BackupRestoreFilterList=<nil>BackupRestoreFilteredFileListUrl=BackupRestoreJobId=BackupRestoreVerificationKey=BackupRestoreSourceGroupId=PitRestoreType=BackupThirdPartyOplogStoreType=EncryptionProviderType=KMIPProxyPort=0KMIPProxyDisabled=falseTemporaryPort=0CredentialsVersion=0Repair=nullRealtimeConfig=<nil>DataExplorerConfig=<nil>DefaultRWConcern=<nil>LdapCaPath=ConfigServers=[]RestartIntervalTimeMs=<nil>ClusterWideConfiguration=ProfilingConfig=<nil>RegionBaseUrl=RegionBaseRealtimeUrl=RegionBaseAgentUrl=StepDownPrimaryForResync=falsekey=<nil>keyLock=null. log destination=
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:22:36.561+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:22:36.561] Failed to planAndExecute : <staging-mongodb-0> [07:22:36.561] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:22:36.561] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:22:36.561] Error sleeping until process was up : <staging-mongodb-0> [07:22:36.561] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:sleepUntilProcessUp:267] <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/action/start.go:func1:145] [103] <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:executePlan:988] <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:planAndExecute:585] <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
[2024-04-22T07:54:17.873+0000] [.error] [src/director/director.go:mainLoop:394] <staging-mongodb-0> [07:54:17.873] Failed to planAndExecute : <staging-mongodb-0> [07:54:17.873] Plan execution failed on step StartFresh as part of move Start : <staging-mongodb-0> [07:54:17.873] Failed to apply action. Result = <nil> : <staging-mongodb-0> [07:54:17.873] Error sleeping until process was up : <staging-mongodb-0> [07:54:17.873] Process staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017 (local=false) has not come up despite waiting for 30m0s
Health logs:
(venv) prachiwaghulkar@Prachis-MacBook-Pro ~ % kubectl exec -it staging-mongodb-0 -c mongodb-agent -- cat /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
{"statuses":{"staging-mongodb-0":{"IsInGoalState":false,"LastMongoUpTime":0,"ExpectedToBeUp":true,"ReplicationStatus":-1}},"mmsStatus":{"staging-mongodb-0":{"name":"staging-mongodb-0","lastGoalVersionAchieved":-1,"plans":[{"automationConfigVersion":1,"started":"2024-04-22T06:50:56.381946486Z","completed":null,"moves":[{"move":"Start","moveDoc":"Start the process","steps":[{"step":"StartFresh","stepDoc":"Start a mongo instance (start fresh)","isWaitStep":false,"started":"2024-04-22T06:50:56.381976336Z","completed":null,"result":"error"}]},{"move":"WaitAllRsMembersUp","moveDoc":"Wait until all members of this process' repl set are up","steps":[{"step":"WaitAllRsMembersUp","stepDoc":"Wait until all members of this process' repl set are up","isWaitStep":true,"started":null,"completed":null,"result":""}]},{"move":"RsInit","moveDoc":"Initialize a replica set including the current MongoDB process","steps":[{"step":"RsInit","stepDoc":"Initialize a replica set","isWaitStep":false,"started":null,"completed":null,"result":""}]},{"move":"WaitFeatureCompatibilityVersionCorrect","moveDoc":"Wait for featureCompatibilityVersion to be right","steps":[{"step":"WaitFeatureCompatibilityVersionCorrect","stepDoc":"Wait for featureCompatibilityVersion to be right","isWaitStep":true,"started":null,"completed":null,"result":""}]}]}],"errorCode":0,"errorString":""}}}%
@nammn Were you able to check the issue?
FYI, these are the mongo-agent, readinessprobe image that I am using.
- image: mongodb/mongodb-agent
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:a208e80f79bb7fe954d9a9a1444bb482dee2e86e5e5ae89dbf240395c4a158b3
tag: 107.0.0.8465-1
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-operator-version-upgrade-post-start-hook
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:08495e1331a1691878e449d971129ed17858a20a7b69bb74d2e84f057cfcc098
tag: 1.0.8
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-operator
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:0aa26010be99caaf8a7dfd9cba81e326261ed99a69ac68b54aa8af3a104970bc
tag: 0.9.0
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
- image: mongodb/mongodb-kubernetes-readinessprobe
mediaType: application/vnd.docker.distribution.manifest.v2
digest: sha256:e84438c5394be7223de27478eb9066204d62e6ecd233d3d4e4c11d3da486a7b5
tag: 1.0.17
platform:
architecture: amd64
os: linux
registries:
- host: quay.io
@irajdeep @nammn Can you or somebody else take a look and assist here please? the mongodb worked fine for us till 5.0.24. I checked with mongodb 5.0.25 today and it errored with the same logs that I have shared above. So in short, we have been encountering this issue since the release 5.0.25!!
Having the exact same issue here. Fresh new instance.
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
creationTimestamp: "2024-05-13T09:11:50Z"
generation: 1
name: wildduck-wildduck-mongo
namespace: solidite-mail
resourceVersion: "192255239"
uid: 7cfab3a7-30ac-434a-b65d-31b638229bde
spec:
additionalMongodConfig:
storage.wiredTiger.engineConfig.cacheSizeGB: 1
members: 1
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
statefulSet:
spec:
template:
metadata:
annotations:
k8up.io/backupcommand: sh -c 'mongodump --username=$MONGODB_USER --password=$MONGODB_PASSWORD
mongodb://localhost/$MONGODB_NAME --archive'
k8up.io/file-extension: .archive
spec:
containers:
- env:
- name: MONGODB_NAME
value: wildduck
- name: MONGODB_USER
value: wildduck
- name: MONGODB_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: wildduck-wildduck-mongo
imagePullPolicy: IfNotPresent
name: mongod
resources:
limits:
cpu: "1"
memory: 1100M
requests:
cpu: "0.3"
memory: 400M
type: ReplicaSet
users:
- db: wildduck
name: wildduck
passwordSecretRef:
name: wildduck-wildduck-mongo
roles:
- db: wildduck
name: readWrite
scramCredentialsSecretName: wildduck-wildduck-mongo-scram
version: 6.0.13
status:
currentMongoDBMembers: 0
currentStatefulSetReplicas: 0
message: ReplicaSet is not yet ready, retrying in 10 seconds
mongoUri: ""
phase: Pending
Describing the pod show the following errors :
Warning Unhealthy 21m (x3 over 21m) kubelet Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory
goroutine 1 [running]:
main.main()
/workspace/cmd/readiness/main.go:226 +0x191
Warning BackOff 11m (x46 over 21m) kubelet Back-off restarting failed container mongod in pod wildduck-wildduck-mongo-0_solidite-mail(ba2270f0-ecf0-4468-b01f-b7a5df538b4b)
Warning Unhealthy 76s (x223 over 21m) kubelet Readiness probe failed:
The pod logs contains nothing revelent
@prachiwaghulkar can you verify that the mongodb image you are using is indeed compatible and working? Looking at the agent log it seems that the agent seems to wait forever and mongod and the related service is not up and running.;
can you somehow get a debug container running trying to access that service? I
staging-mongodb-0.staging-mongodb-svc.staging.svc.cluster.local:27017
Facing same issue mongodb instance readiness probe failing for mongodb-agent container MongoDB Community operator Version: community-operator-0.9.0 Openshift version: 4.14.25
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
name: mongodb-devops-test
namespace: di-devops
spec:
additionalConnectionStringConfig:
readPreference: primary
additionalMongodConfig:
storage.wiredTiger.engineConfig.journalCompressor: zlib
members: 3
security:
authentication:
ignoreUnknownUsers: true
modes:
- SCRAM
statefulSet:
spec:
selector:
matchLabels:
app.kubernetes.io/name: mongodb
template:
metadata:
labels:
app.kubernetes.io/name: mongodb
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- podAffinityTerm:
labelSelector:
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- mongodb
topologyKey: kubernetes.io/hostname
weight: 100
containers:
- name: mongod
resources:
limits:
cpu: '0.2'
memory: 250M
requests:
cpu: '0.2'
memory: 200M
- name: mongodb-agent
readinessProbe:
failureThreshold: 40
initialDelaySeconds: 5
timeout: 30
resources:
limits:
cpu: '0.2'
memory: 250M
requests:
cpu: '0.2'
memory: 200M
initContainers:
- name: mongodb-agent-readinessprobe
resources:
limits:
cpu: '2'
memory: 200M
requests:
cpu: '1'
memory: 100M
type: ReplicaSet
users:
- additionalConnectionStringConfig:
readPreference: secondary
db: didevops
name: didevops
passwordSecretRef:
name: my-user-password
roles:
- db: didevops
name: clusterAdmin
- db: didevops
name: userAdminAnyDatabase
- db: didveops
name: readWriteAnyDatabase
scramCredentialsSecretName: my-scram
version: 6.0.5
status:
currentMongoDBMembers: 3
currentStatefulSetReplicas: 3
message: 'ReplicaSet is not yet ready, retrying in 10 seconds'
mongoUri: 'mongodb://mongodb-devops-test-0.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-1.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017,mongodb-devops-test-2.mongodb-devops-test-svc.di-devops.svc.cluster.local:27017/?replicaSet=mongodb-devops-test&readPreference=primary'
phase: Pending
version: 6.0.5
Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.
Ensure that your node has correct CPU model available. Mongo required AVX support. I didn't expose the CPU flag nor used the host CPU model passtrough, causing Mongo to not start.
How can I ensure that node has correct CPU model available in openshift pod, is there any docs available or command which can help it supports?
Any update on this I am facing the same issue
In my case we had to pass host CPU model from Proxmox. Cloud providers should already pass the correct model.
lscpu | grep avx
will show you whether your CPU supports AVX or not.
For those on this thread, @veebkolm was absolutely right that it was a CPU flag for me. I am working Proxmox. To fix this I shelled into the host and had to pass the CPU features through to Kubernetes Nodes.
TO RESOLVE (For me):
cd /etc/pve/qemu-server/
nano
Now all is right in the mongo world.
What did you do to encounter the bug? Applied the mongodb CR using MongoDB image 5.0.26. mongodb pod is in CrashLoopBackOff and mongodbcommunity is in Pending state.
Pod logs give the following error:
Describe on the pod gives below error in events:
What did you expect? /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json should exist and this error should not come.
What happened instead? /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json file doesn't exist and the error is thrown. mongodb pod is in crashloopbackoff.
Operator Information
If possible, please include: