Closed marandalucas closed 1 year ago
I am seeing the same issue. Upgrading to Operator 0.8.0 made no difference.
The second Pod in the Statefulset is unable to start up because the first Pod is not ready. The reason the first Pod is not ready is because the step WaitAllRsMembersUp
has a result of wait
, as you can see in the /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
.
This will obviously never work, since the first Pod is not ready until the second Pod is ready, which it can't be since it's relying on the first Pod to be ready...
I seem to occasionally get the second Pod to start, by pure chance and some patience. The first Pod seems to randomly become ready sometimes just long enough for the second Pod to start up. This can end up with both Pods staying ready for a while.
k get po
NAME READY STATUS RESTARTS AGE
mdb-0 2/2 Running 0 5m5s
mdb-1 2/2 Running 0 4m47s
mongodb-kubernetes-operator-8f9756c67-47q6z 1/1 Running 0 2d18h
This doesn't last though, as the WaitAllRsMembersUp
goes back to the wait
result, causing readinessprobe
to return 1.
I have no name!@mdb-0:/$ grep -o -E 'WaitAllRsMembersUp.+wait"}]' /var/log/mongodb-mms-automation/healthstatus/agent-health-status.json
WaitAllRsMembersUp","moveDoc":"Wait until all members of this process' repl set are up","steps":[{"step":"WaitAllRsMembersUp","stepDoc":"Wait until all members of this process' repl set are up","isWaitStep":true,"started":"2023-05-29T08:41:04.756579981Z","completed":null,"result":"wait"}]
I have no name!@mdb-0:/$ /opt/scripts/readinessprobe; echo $?
1
Then after a few minutes, both Pods are no longer ready. In this case, around ~8 minutes total.
k get po
NAME READY STATUS RESTARTS AGE
mdb-0 1/2 Running 0 9m8s
mdb-1 1/2 Running 0 8m50s
mongodb-kubernetes-operator-8f9756c67-47q6z 1/1 Running 0 2d18h
What's also confusing to me is that the Agent logs seem to indicate that it's happy with the state.
[2023-05-29T08:55:10.192+0000] [.info] [main/components/agent.go:Iterate:892] [08:55:10.192] All 1 Mongo processes are in goal state.
Even though the readinessprobe
is not happy. At the same time, the Operator also isn't happy with the goal state.
2023-05-29T08:55:21.328Z DEBUG agent/agent_readiness.go:65 The Agent in the Pod 'mdb-0' hasn't reached the goal state yet (goal: 23771, agent: 23768) {"ReplicaSet": "sorting/mdb"}
Look at the timestamps. They are almost always logging these conflicting messages at the same time.
This issue is being marked stale because it has been open for 60 days with no activity. Please comment if this issue is still affecting you. If there is no change, this issue will be closed in 30 days.
This issue should be fixed by https://github.com/mongodb/mongodb-kubernetes-operator/pull/1332. Please upgrade your mongodb-kubernetes-readinessprobe
to 1.0.15
this problem reappears when i have more than 1 replica and also the pod labels changed:
statefulSet:
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
the labels were changed as suggested here https://github.com/mongodb/mongodb-kubernetes-operator/blob/master/config/samples/mongodb.com_v1_mongodbcommunity_cr_podantiaffinity.yaml
The stateful set and pod both become ready but the agent gets stuck in the "hasn't reached the goal state yet" state forever.
What did you do to encounter the bug?
We've got a MongoDB Replicaset with issues to start up.. (It already has important data)
I think that the root cause of the issue is because the "mongodb-agent" is running a Readinessprobe which is failing:
What did you expect? A clean ReplicaSet start-up.
Operator Information
Kubernetes Cluster Information
v1.23.16-gke.1100
WORKAROUND APPLIED
We had to delete the following ReadinessProbe in the mongo Statefulset:
Could anyone help me? I'm not an expert in mongodb and I'm still trying to figure out what's going on...