mongodb / mongodb-kubernetes-operator

MongoDB Community Kubernetes Operator
Other
1.22k stars 497 forks source link

Agent health Status is not readable when securityContext is in Containers #730

Closed jatinderjawanda closed 2 years ago

jatinderjawanda commented 3 years ago

What did you do to encounter the bug? This appears to be the same problem as reported in https://github.com/mongodb/mongodb-kubernetes-operator/issues/533 The only difference is the securityContext.

What did you expect? After applying mongodb.MongoDBCommunity.yml, I wait until kubectl get mdbc -n mongodb reports Running. I then shell into the mongod container and attempt to connect locally using the mongo client.

What happened instead? The connection is refused and further investigation reveals mongod is not running

I have no name!@mongodb-cluster-0:/$ mongo --tls --tlsCAFile /var/lib/tls/ca/ca.crt --tlsCertificateKeyFile /var/lib/tls/server/*.pem --host mongodb-cluster-svc.mongodb.svc.kubernetes.local
MongoDB shell version v4.4.6
connecting to: mongodb://mongodb-cluster-svc.mongodb.svc.kubernetes.local:27017/?compressors=disabled&gssapiServiceName=mongodb
Error: couldn't connect to server mongodb-cluster-svc.mongodb.svc.kubernetes.local:27017, connection attempt failed: SocketException: Error connecting to mongodb-cluster-svc.mongodb.svc.kubernetes.local:27017 (172.16.13.59:27017) :: caused by :: Connection refused :
connect@src/mongo/shell/mongo.js:374:17
@(connect):2:6
exception: connect failed
exiting with code 1

Operator Information

Kubernetes Cluster Information

Additional context The operator was created using the instructions provided in the repo's documentation.

mongodb.MongoDBCommunity.yml:

---
apiVersion: mongodbcommunity.mongodb.com/v1
kind: MongoDBCommunity
metadata:
  name: mongodb-cluster
spec:
  members: 1
  type: ReplicaSet
  version: "4.4.6"
  security:
    tls:
      enabled: true
      certificateKeySecretRef:
        name: mongodb-tls
      caConfigMapRef:
        name: ca-config-map
    authentication:
      modes: ["SCRAM"]
  users:
    - name: my-user
      db: admin
      passwordSecretRef: # a reference to the secret that will be used to generate the user's password
        name: my-user-password
      roles:
        - name: clusterAdmin
          db: admin
        - name: userAdminAnyDatabase
          db: admin
      scramCredentialsSecretName: my-scram
  additionalMongodConfig:
    storage.wiredTiger.engineConfig.journalCompressor: zlib
  statefulSet:
    spec:
      volumeClaimTemplates:
        - metadata:
            name: data-volume
          spec:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 1Gi
            storageClassName: mongodb-local-storage
        - metadata:
            name: logs-volume
          spec:
            accessModes: ["ReadWriteOnce"]
            resources:
              requests:
                storage: 1Gi
            storageClassName: mongodb-local-storage
      template:
        spec:
         initContainers:
           - name: mongodb-agent-readinessprobe
             securityContext:
                runAsNonRoot: true
                runAsUser: 2000
           - name: mongod-posthook
             securityContext:
                runAsNonRoot: true
                runAsUser: 2000

---
apiVersion: v1
kind: Secret
metadata:
  name: my-user-password
type: Opaque
stringData:
  password: Test@1234!

agent-health-status.json

kubectl -n mongodb exec -it mongodb-cluster-0 -c mongod -- ls -lrth /healthstatus/agent-health-status.json
-rw------- 1 2000 2000 30 Sep 22 09:01 /healthstatus/agent-health-status.json

kubectl -n mongodb exec -it mongodb-cluster-0 -c mongod -- cat /healthstatus/agent-health-status.json
{"statuses":{},"mmsStatus":{}}

Operator Logs

2021-09-22T08:41:24.169Z    DEBUG   scram/scram.go:165  Credentials have not changed, using credentials stored in: secret/my-scram-scram-credentials
2021-09-22T08:41:24.169Z    DEBUG   controllers/replica_set_controller.go:428   AutomationConfigMembersThisReconciliation   {"mdb.AutomationConfigMembersThisReconciliation()": 1}
2021-09-22T08:41:24.170Z    DEBUG   controllers/replica_set_controller.go:330   Waiting for agents to reach version 1   {"ReplicaSet": "mongodb/mongodb-cluster"}
2021-09-22T08:41:24.171Z    INFO    agent/agent_readiness.go:52 All 1 Agents have reached Goal state    {"ReplicaSet": "mongodb/mongodb-cluster"}
2021-09-22T08:41:24.171Z    DEBUG   controllers/replica_set_controller.go:202   Resetting StatefulSet UpdateStrategy to RollingUpdate   {"ReplicaSet": "mongodb/mongodb-cluster"}
2021-09-22T08:41:24.278Z    INFO    controllers/replica_set_controller.go:247   Successfully finished reconciliation, MongoDB.Spec: {Members:1 Type:ReplicaSet Version:4.4.6 Arbiters:0 FeatureCompatibilityVersion: ReplicaSetHorizons:[] Security:{Authentication:{Modes:[SCRAM] IgnoreUnknownUsers:0xc00078e5bc} TLS:{Enabled:true Optional:false CertificateKeySecret:{Name:mongodb-tls} CaConfigMap:{Name:ca-config-map}} Roles:[]} Users:[{Name:my-user DB:admin PasswordSecretRef:{Name:my-user-password Key:} Roles:[{DB:admin Name:clusterAdmin} {DB:admin Name:userAdminAnyDatabase}] ScramCredentialsSecretName:my-scram}] StatefulSetConfiguration:{SpecWrapper:{Spec:{Replicas:<nil> Selector:nil Template:{ObjectMeta:{Name: GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Spec:{Volumes:[] InitContainers:[{Name:mongodb-agent-readinessprobe Image: Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil StartupProbe:nil Lifecycle:nil TerminationMessagePath: TerminationMessagePolicy: ImagePullPolicy: SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:*2000,RunAsNonRoot:*true,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,} Stdin:false StdinOnce:false TTY:false} {Name:mongod-posthook Image: Command:[] Args:[] WorkingDir: Ports:[] EnvFrom:[] Env:[] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil StartupProbe:nil Lifecycle:nil TerminationMessagePath: TerminationMessagePolicy: ImagePullPolicy: SecurityContext:&SecurityContext{Capabilities:nil,Privileged:nil,SELinuxOptions:nil,RunAsUser:*2000,RunAsNonRoot:*true,ReadOnlyRootFilesystem:nil,AllowPrivilegeEscalation:nil,RunAsGroup:nil,ProcMount:nil,WindowsOptions:nil,SeccompProfile:nil,} Stdin:false StdinOnce:false TTY:false}] Containers:[] EphemeralContainers:[] RestartPolicy: TerminationGracePeriodSeconds:<nil> ActiveDeadlineSeconds:<nil> DNSPolicy: NodeSelector:map[] ServiceAccountName: DeprecatedServiceAccount: AutomountServiceAccountToken:<nil> NodeName: HostNetwork:false HostPID:false HostIPC:false ShareProcessNamespace:<nil> SecurityContext:nil ImagePullSecrets:[] Hostname: Subdomain: Affinity:nil SchedulerName: Tolerations:[] HostAliases:[] PriorityClassName: Priority:<nil> DNSConfig:nil ReadinessGates:[] RuntimeClassName:<nil> EnableServiceLinks:<nil> PreemptionPolicy:<nil> Overhead:map[] TopologySpreadConstraints:[] SetHostnameAsFQDN:<nil>}} VolumeClaimTemplates:[{TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:data-volume GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Spec:{AccessModes:[ReadWriteOnce] Selector:nil Resources:{Limits:map[] Requests:map[storage:{i:{value:1073741824 scale:0} d:{Dec:<nil>} s:1Gi Format:BinarySI}]} VolumeName: StorageClassName:0xc000726ac0 VolumeMode:<nil> DataSource:nil} Status:{Phase: AccessModes:[] Capacity:map[] Conditions:[]}} {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name:logs-volume GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:<nil> DeletionGracePeriodSeconds:<nil> Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ClusterName: ManagedFields:[]} Spec:{AccessModes:[ReadWriteOnce] Selector:nil Resources:{Limits:map[] Requests:map[storage:{i:{value:1073741824 scale:0} d:{Dec:<nil>} s:1Gi Format:BinarySI}]} VolumeName: StorageClassName:0xc000726ae0 VolumeMode:<nil> DataSource:nil} Status:{Phase: AccessModes:[] Capacity:map[] Conditions:[]}}] ServiceName: PodManagementPolicy: UpdateStrategy:{Type: RollingUpdate:nil} RevisionHistoryLimit:<nil>}}} AdditionalMongodConfig:{Object:map[storage.wiredTiger.engineConfig.journalCompressor:zlib]}}, MongoDB.Status: {MongoURI:mongodb://mongodb-cluster-0.mongodb-cluster-svc.mongodb.svc.cluster.local:27017 Phase:Running CurrentStatefulSetReplicas:1 CurrentMongoDBMembers:1 Message:}   {"ReplicaSet": "mongodb/mongodb-cluster"}

mongod pod logs

kubectl -n mongodb logs -f mongodb-cluster-0 -c mongod
2021-09-22T08:41:10.342Z    INFO    versionhook/main.go:34  Running version change post-start hook
2021-09-22T08:41:10.350Z    INFO    versionhook/main.go:41  Waiting for agent health status...
2021-09-22T08:41:11.351Z    ERROR   versionhook/main.go:49  Error getting the agent health file: could not open file: open /healthstatus/agent-health-status.json: no such file or directory
main.main
    /go/cmd/versionhook/main.go:49
runtime.main
    /usr/local/go/src/runtime/proc.go:225

mongodb-agent pod logs

kubectl -n mongodb logs -f mongodb-cluster-0 -c mongodb-agent
[2021-09-22T08:41:12.133+0000] [.info] [src/util/sysdep_unix.go:LockAutomationLockFile:321] [08:41:12.130] Locking automation lock file at /tmp/mongodb-mms-automation.lock
[2021-09-22T08:41:12.134+0000] [.info] [main/components/agent.go:NewAgent:123] [08:41:12.134] Constructing new agent object with desiredClusterConfigPath=/var/lib/automation/config/cluster-config.json
[2021-09-22T08:41:12.135+0000] [.info] [src/mongosqld/custodian.go:NewCustodian:138] <mongosqld custodian> [08:41:12.135] Started
[2021-09-22T08:41:12.136+0000] [.info] [src/kmipproxy/custodian.go:mainLoop:205] <kmipProxyMaintainer> [08:41:12.136] Starting main loop
[2021-09-22T08:41:12.137+0000] [.info] [realtime/rtcollector/rtcollector.go:controlLoop:87] <rtCollector> [08:41:12.136] Starting control loop
[2021-09-22T08:41:12.137+0000] [.info] [src/dataexplorer/dataexplorer.go:controlLoop:108] <dataExplorer> [08:41:12.136] Starting control loop
[2021-09-22T08:41:12.160+0000] [.info] [main/components/agent.go:LoadClusterConfig:259] [08:41:12.160] New cluster config received! 0 (<nil>) -> 1 (2021-09-22 08:40:59.313248393 +0000 UTC)
[2021-09-22T08:41:12.161+0000] [.info] [src/modules/agents_unix.go:KillAllAgents:42] [08:41:12.161] Killing all running mongodb-mms-monitoring-agent agents at /var/lib/mongodb-mms-automation/mongodb-mms-monitoring-agent-.+\..+_.+/mongodb-mms-monitoring-agent *$
[2021-09-22T08:41:12.198+0000] [.info] [src/modules/agents_unix.go:KillAllAgents:42] [08:41:12.197] Killing all running mongodb-mms-backup-agent agents at /var/lib/mongodb-mms-automation/mongodb-mms-backup-agent-.+\..+_.+/mongodb-mms-backup-agent *$
[2021-09-22T08:41:12.656+0000] [.info] [src/dataexplorer/dataexplorer.go:controlLoop:114] <dataExplorer> [08:41:12.656] Received new cluster config!
[2021-09-22T08:41:12.656+0000] [.info] [realtime/rtcollector/rtcollector.go:controlLoop:95] <rtCollector> [08:41:12.656] Received new cluster config!
[2021-09-22T08:41:12.666+0000] [.info] [src/maintainers/externalmaintainer.go:mainLoop:560] [08:41:12.666] Got a new descriptor for atlasProxy
[2021-09-22T08:41:12.666+0000] [.info] [src/maintainers/externalmaintainer.go:mainLoop:560] [08:41:12.666] Got a new descriptor for mongot
[2021-09-22T08:41:13.668+0000] [.info] [main/components/agent.go:Iterate:848] [08:41:13.668] All 0 Mongo processes are in goal state.
[2021-09-22T08:41:13.668+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:41:13.668] clusterConfig unchanged
[2021-09-22T08:41:23.669+0000] [.info] [main/components/agent.go:Iterate:848] [08:41:23.669] All 0 Mongo processes are in goal state.
[2021-09-22T08:41:23.670+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:41:23.670] clusterConfig unchanged
[2021-09-22T08:41:33.671+0000] [.info] [main/components/agent.go:Iterate:848] [08:41:33.671] All 0 Mongo processes are in goal state.
[2021-09-22T08:41:33.672+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:41:33.672] clusterConfig unchanged
[2021-09-22T08:41:43.673+0000] [.info] [main/components/agent.go:Iterate:848] [08:41:43.673] All 0 Mongo processes are in goal state.
[2021-09-22T08:41:43.674+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:41:43.674] clusterConfig unchanged
[2021-09-22T08:41:53.675+0000] [.info] [main/components/agent.go:Iterate:848] [08:41:53.675] All 0 Mongo processes are in goal state.
[2021-09-22T08:41:53.677+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:41:53.677] clusterConfig unchanged
[2021-09-22T08:42:03.678+0000] [.info] [main/components/agent.go:Iterate:848] [08:42:03.678] All 0 Mongo processes are in goal state.
[2021-09-22T08:42:03.679+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:42:03.679] clusterConfig unchanged
[2021-09-22T08:42:12.138+0000] [.info] [src/runtimestats/printer.go:mainLoop:58] <runtimestatsPrinter> [08:42:12.138] memory:rss=22466560
alloc=3334512   totalalloc=6361504  mallocs=41304   frees=20132 heapinuse=5136384   heapobjects=21172   gcpausetotalns=586760   gcpauselastns=94994 gcnum=3 utimens=156335000   stimens=184452000   minflt=9079 majflt=41
[2021-09-22T08:42:13.685+0000] [.info] [main/components/agent.go:Iterate:848] [08:42:13.685] All 0 Mongo processes are in goal state.
[2021-09-22T08:42:13.686+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:42:13.686] clusterConfig unchanged
[2021-09-22T08:42:23.687+0000] [.info] [main/components/agent.go:Iterate:848] [08:42:23.687] All 0 Mongo processes are in goal state.
[2021-09-22T08:42:23.687+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:42:23.687] clusterConfig unchanged
[2021-09-22T08:42:33.689+0000] [.info] [main/components/agent.go:Iterate:848] [08:42:33.689] All 0 Mongo processes are in goal state.
[2021-09-22T08:42:33.689+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:42:33.689] clusterConfig unchanged
[2021-09-22T08:42:43.690+0000] [.info] [main/components/agent.go:Iterate:848] [08:42:43.690] All 0 Mongo processes are in goal state.
[2021-09-22T08:42:43.691+0000] [.info] [main/components/agent.go:LoadClusterConfig:262] [08:42:43.691] clusterConfig unchanged
ghost commented 3 years ago

Can we set the agent log level to verbose to find a possible reason these files are not being created?

github-actions[bot] commented 2 years ago

This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.

github-actions[bot] commented 2 years ago

This issue was closed because it became stale and did not receive further updates. If the issue is still affecting you, please re-open it, or file a fresh Issue with updated information.

dongluyang commented 2 years ago

I meet the same issue, please reopen it

paniKatarinka commented 1 year ago

Hi! I have the same issue, after applying securityContext to containers I am getting this issues: Readiness probe failed: panic: open /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json: no such file or directory goroutine 1 [running]: main.main() /workspace/cmd/readiness/main.go:217 +0x19a

Additionally agent complains: cat: /mongodb-automation/agent-api-key/agentApiKey: No such file or directory

Did anybody find a solution?

balait4 commented 1 year ago

Hi, I'm also getting the same issue, any workaround or solution so far? Thanks

parisnakitakejser commented 1 year ago

I hit the same error on macOS on Docker Desktop using Kubernetes, its have working but random stop working and this issue happen!

what i discovered was if i have run a MongoDB in default namespace its give a error, if i create a new namespace its working, really don't know what happen here....

If i remove it and remove the PVC & PV storage and try again in the test namespace its working, i do this in my default namespace it hit the error, so i think the default namespace can't be used....