signal18 / replication-manager

Signal 18 repman - Replication Manager for MySQL / MariaDB / Percona Server
https://signal18.io/products/srm
GNU General Public License v3.0
647 stars 167 forks source link

Unrecoverable label status and data anomalies #788

Closed dibrother closed 1 month ago

dibrother commented 1 month ago

Q1:

Suspect.log

image image

Q2:

What is being used proxysql(2.6.3) + mysql(8.0.37) +repliction-manager(2.3.40), When executing switchover, the old master will have some additional transactions because during switchover, the old master was not set to read_only in the first step. If the old master is manually set to read_only and super_dead_only before switching in the Pre handover script, there will be no more exceptions.

When switching, the old master can still write data.

time="2024-08-05 13:40:37" level=info msg=-------------------------- cluster=cluster1 module=general type=log
time="2024-08-05 13:40:37" level=info msg="Starting master switchover" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:37" level=info msg=-------------------------- cluster=cluster1 module=general type=log
time="2024-08-05 13:40:37" level=info msg="Checking long running updates on master 10" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Flushing tables on master 10.10.2.11:3306" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Electing a new master" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Election rig: 10.10.2.10:3306 elected as preferred master" cluster=cluster1 module=election type=log
time="2024-08-05 13:40:38" level=info msg="Slave 10.10.2.10:3306 has been elected as a new master" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Server 10.10.2.10:3306 state transition from Slave changed to: Master" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Calling pre-failover script" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Pre-failover script complete:do set read_only_type!" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes stopping all slaves on 10.10.2.11:3306" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes set read only on 10.10.2.11:3306" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes saving max_connections on 10.10.2.11:3306 " cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes decreasing max_connections to 1 on 10.10.2.11:3306 " cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes killing all other remaining threads on  10.10.2.11:3306" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Freezing writes rejecting writes via FTWRL on 10.10.2.11:3306 " cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Waiting for candidate master 10.10.2.10:3306 to apply relay log" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Reading all relay logs on 10.10.2.10:3306" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Waiting sync IO_Pos:{mysql-bin.000019 %!s(bool=true)}/237, Slave_Pos:{mysql-bin.000019 %!s(bool=true)} 237" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Save replication status and crash infos before opening traffic" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="MySQL GTID saving crash info for replication ExexecutedGtidSet fd2ad9de-4a56-11ef-bd86-000c29532d30:1-21464443,\nfeb5e967-4a56-11ef-bde0-000c293d1396:1-13035" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Stopping slave threads on new master" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Resetting slave on new master and set read/write mode on" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Failover proxies" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:38" level=info msg="Waiting 2s for unmanaged proxy to monitor route change" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:40" level=info msg="Inject fake transaction on new master 10.10.2.10:3306 " cluster=cluster1 module=general type=log
time="2024-08-05 13:40:40" level=info msg="Killing new connections on old master showing before update route" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:40" level=info msg="Switching old leader to slave" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:40" level=info msg="Doing MySQL GTID switch of the old master" cluster=cluster1 module=general type=log
time="2024-08-05 13:40:40" level=info msg="Server 10.10.2.11:3306 state transition from Master changed to: Slave" cluster=cluster1 module=general type=log
caffeinated92 commented 1 month ago

Hi @dibrother,

noted and will be back if I got some leads for extra write on master.

caffeinated92 commented 1 month ago

For label status, I think it's fixed in 2.3.41

dibrother commented 1 month ago

For label status, I think it's fixed in 2.3.41对于标签状态,我认为它在 2.3.41 中已修复 Okay, I'll test the latest version

caffeinated92 commented 1 month ago

We will close this issue, please open it if you still have the same problem with the latest version