Open DiegoDAF opened 3 years ago
After check, I have 2 nodes down with the same problem
kubectl get pods -l application=spilo -L spilo-role -n mdbs -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES SPILO-ROLE
poc-db-test-22-1tb-0 1/1 Running 0 30m 10.237.192.216 r13-u15 <none> <none>
poc-db-test-22-1tb-1 1/1 Running 0 56m 10.237.197.90 r11-u27 <none> <none> master
poc-db-test-22-1tb-2 1/1 Running 0 56m 10.237.197.91 r11-u26 <none> <none> replica
poc-db-test-22-1tb-3 1/1 Running 0 82s 10.237.197.82 r12-u24 <none> <none>
At this moment, I don't know how to recover this dead nodes.
Running HA system like flying a modern airplane, mostly autopilot, but you have to know how to fly manually is something goes wrong. Specifically, you can reinitialise the node by running patronictl reinit. But, you have to understand how it got into such state. That would require analysing logs before the issue started. All logs, including postgres.
Many thanks!!! Totally agree!!! I owe you a beer !!
Hi all, I making a POC, in my test I killed the primary node, other node take the primary role, the new replica rewind.... but dead whit this messages:
the deploy yml
Can anyone help me trying to understand what happened?