percona / percona-server-mysql-operator

Percona Operator for MySQL
https://www.percona.com/doc/kubernetes-operator-for-mysql/ps/index.html
Apache License 2.0
130 stars 26 forks source link

K8SPS-359: Fix rebooting cluster if it's partially online #725

Closed egegunes closed 1 month ago

egegunes commented 1 month ago

K8SPS-359 Powered by Pull Request Badge

CHANGE DESCRIPTION

Problem: If 2 out of 3 MySQL pods are terminated abruptly, operator fails to reboot the cluster (sporadically).

Cause: Even though failed pods go into crash recovery mode once they come up again, MySQL complains about cluster being already ONLINE when operator tries to run dba.rebootClusterFromCompleteOutage().

Solution: If operator can't run dba.rebootClusterFromCompleteOutage() because cluster is already online, delete all MySQL pods to force recovery.

CHECKLIST

Jira

Tests

Config/Logging/Testability

JNKPercona commented 1 month ago
Test name Status
version-service passed
async-ignore-annotations passed
auto-config passed
config passed
config-router passed
demand-backup passed
gr-demand-backup passed
gr-demand-backup-haproxy passed
gr-finalizer passed
gr-haproxy passed
gr-ignore-annotations passed
gr-init-deploy passed
gr-one-pod passed
gr-recreate passed
gr-scaling passed
gr-scheduled-backup passed
gr-security-context passed
gr-self-healing passed
gr-tls-cert-manager passed
gr-users passed
haproxy passed
init-deploy passed
limits passed
monitoring passed
one-pod passed
operator-self-healing passed
recreate passed
scaling passed
scheduled-backup passed
service-per-pod passed
sidecars passed
smart-update passed
tls-cert-manager passed
users passed
We run 34 out of 34

commit: https://github.com/percona/percona-server-mysql-operator/pull/725/commits/75bb02dd587209fdce8554203dd9e23ac91360ea image: perconalab/percona-server-mysql-operator:PR-725-75bb02dd