Can't recover from disk full error

Report

Disk (PVC) was full; I made the PVCs bigger and let mysql restart.

The group replication never came back. I set instance 0 to "bootstrap" and it got replication working.

The two other instances never finished "recovering" though and are just crash looping now. The logs from one of them attached.

More about the problem

mysql-1.txt

The controller doesn't have any (to me) useful information; it seems to think everything is fine-ish. I'm not sure what role the controller here has though (I'm migrating from the bitpoke operator which worked a little differently with the orchestrator exposed).

2024-10-23T16:21:26.676Z    INFO    Crash recovery  Pod is waiting for recovery {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-10"}
2024-10-23T16:22:27.762Z    INFO    Crash recovery  Pod is waiting for recovery {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766194,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1060a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:23:40.357Z    INFO    Crash recovery  Cluster was successfully rebooted   {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65"}
2024-10-23T16:23:47.288Z    INFO    groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb  Member is not ONLINE    {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "d8d400bb-a9ba-4351-a213-4dcd61755a65", "state": "RECOVERING"}
2024-10-23T16:30:19.004Z    INFO    Crash recovery  Pod is waiting for recovery {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-0", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-13"}
2024-10-23T16:31:20.054Z    INFO    Crash recovery  Pod is waiting for recovery {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "pod": "ntpdb-mysql-1", "gtidExecuted": "60a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-17766305,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-1360a1cc96-859c-11ef-99ea-fe24b27f638b:1-4,6c35e34c-859c-11ef-9ccb-fe24b27f638b:1-16262363,6c35e5d8-859c-11ef-9ccb-fe24b27f638b:1-5"}
2024-10-23T16:31:55.660Z    INFO    Crash recovery  Cluster was successfully rebooted   {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940"}
2024-10-23T16:32:02.594Z    INFO    groupReplicationStatus.ntpdb-mysql-1.ntpdb-mysql.ntpdb  Member is not ONLINE    {"controller": "ps-controller", "controllerGroup": "ps.percona.com", "controllerKind": "PerconaServerMySQL", "PerconaServerMySQL": {"name":"ntpdb","namespace":"ntpdb"}, "namespace": "ntpdb", "name": "ntpdb", "reconcileID": "b7ace1d4-d0eb-47cd-83d3-e9e8f7a81940", "state": "OFFLINE"}

Steps to reproduce

let disk run full; for example use the default configuration that doesn't limit how many binlog files are kept.
watch cluster go down
watch cluster not recover after disk has been added
force mysql-0 to start group replication
watch the replicas never recovering

Versions

Kubernetes - v1.28.12
Operator - 0.8.0
Database - the default 8.x version from 0.8.0

Anything else?

No response

percona / percona-server-mysql-operator