Open chernomor opened 1 week ago
As I see, bootstrapAsyncReplication expects all cluster peers from getTopology, but getTopology can not connect to some peers as all nodes now in CrashLoopBackOff state. I think it not need to require all pods be available.
Another problem (or first?), which was suppressed with sleep-forever
now: master pod can not start as it can not resolve primary name cluter1-mysql-0.cluster1-mysql.mysql-test
retrived from replica status and this name is not resolved now, becouse pods has names like cluster1-mysql-unready.mysql-test
while pods is in starting states (I could be wrong). I d't know how it may be fixed now.
2024/06/25 16:51:32 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/25 16:51:32 Primary: cluster1-mysql-0.cluster1-mysql.mysql-test Replicas: [cluster1-mysql-1.cluster1-mysql.mysql-test cluster1-mysql-2.cluster1-mysql.mysql-test]
2024/06/25 16:51:32 FQDN: cluster1-mysql-0.cluster1-mysql.mysql-test
2024/06/25 16:51:32 lookup cluster1-mysql-0 [10.42.0.56]
2024/06/25 16:51:32 PodIP: 10.42.0.56
2024/06/25 16:51:32 bootstrap finished in 0.021992 seconds
2024/06/25 16:51:32 bootstrap failed: get primary IP: lookup cluster1-mysql-0.cluster1-mysql.mysql-test: lookup cluster1-mysql-0.cluster1-mysql.mysql-test on 10.43.0.10:53: server misbehaving
2024/06/25 16:51:42 Peers: [10-42-0-49.cluster1-mysql-unready.mysql-test 10-42-0-55.cluster1-mysql-unready.mysql-test 10-42-0-56.cluster1-mysql-unready.mysql-test]
2024/06/25 16:51:42 Primary: cluter1-mysql-0.cluster1-mysql.mysql-test Replicas: [cluster1-mysql-1.cluster1-mysql.mysql-test cluster1-mysql-2.cluster1-mysql.mysql-test]
2024/06/25 16:51:42 FQDN: cluster1-mysql-0.cluster1-mysql.mysql-test
2024/06/25 16:51:42 lookup cluster1-mysql-0 [10.42.0.56]
2024/06/25 16:51:42 PodIP: 10.42.0.56
2024/06/25 16:51:42 bootstrap finished in 0.021340 seconds
2024/06/25 16:51:42 bootstrap failed: get primary IP: lookup cluster1-mysql-0.cluster1-mysql.mysql-test: lookup cluster1-mysql-0.cluster1-mysql.mysql-test on 10.43.0.10:53: server misbehaving
Some changes in deploy/cr.yaml:
--- a/deploy/cr.yaml
+++ b/deploy/cr.yaml
@@ -31,7 +31,7 @@ spec:
# group: cert-manager.io
mysql:
- clusterType: group-replication
+ clusterType: async
autoRecovery: true
image: percona/percona-server:8.0.36-28
imagePullPolicy: Always
@@ -58,9 +58,12 @@ spec:
# periodSeconds: 10
# failureThreshold: 3
# successThreshold: 1
+#
+ startupProbe:
+ failureThreshold: 5
affinity:
- antiAffinityTopologyKey: "kubernetes.io/hostname"
+ antiAffinityTopologyKey: "none"
# advanced:
Report
I've setup cluster according https://docs.percona.com/percona-operator-for-mysql/ps/kubectl.html in single node k3s. All mysql pods working fine, when I've reboot k3s node and mysql cluster can not start.
More about the problem
I've touch file
/var/lib/mysql/sleep-forever
in master podcluster1-mysql-0
and it running now, but slaves in CrashLoopBackOff:Some logs from bootstrap on slave pod:
Steps to reproduce
Versions
Kubernetes k3s version v1.29.5+k3s1 (4e53a323) go version go1.21.9
Operator 83b9f60ec88d0cd2b5b1a2c2721bd6ae18fc7dc8, v0.7.0
Database mysql Ver 8.0.36-28 for Linux on x86_64 (Percona Server (GPL), Release 28, Revision 47601f19)
Anything else?
No response