Can orchestrator + semi-sync guarantee zero data loss?

Fanduzi commented 3 years ago

Let's say I have one master and two slaves, semi-sync is on, rpl_semi_sync_master_wait_for_slave_count = 1 M is the master S1 and S2 are the slaves At some point:

S1's Executed_Gtid_Set < S2's Executed_Gtid_Set
There is a problem with S2 network. Some binlog events have not yet written to S2's relay log, but the IO_THREAD status is still running. Then, master crash. which slave will be the new master?

Here I provide a test method for this scenario:

start a script to INSERT into table T1.
on S1 lock table t1 read, after that sql thread will block , S1's Executed_Gtid_Set will not change any more, and at this moment S2's sql thread is still running applying relay logs, so S2's Executed_Gtid_Set will be a superset of S1's Executed_Gtid_Set
on S2 run these commnad(I don't know exactly how these commands work, but the purpose is to simulate a network exception)
```
set slave_net_timeout = 3600 -- Just to make it easier to do tests
```

tc qdisc del dev ens33 root
tc qdisc add dev ens33 root handle 1: prio tc filter add dev ens33 protocol ip parent 1: prio 1 u32 match ip dst 172.16.120.10 flowid 1:1 tc filter add dev ens33 protocol all parent 1: prio 2 u32 match ip dst 0.0.0.0/0 flowid 1:2 tc filter add dev ens33 protocol all parent 1: prio 2 u32 match ip protocol 1 0xff flowid 1:2 tc qdisc add dev ens33 parent 1:1 handle 10: netem delay 180000ms tc qdisc add dev ens33 parent 1:2 handle 20: sfq


after run these command, S2 will not receive master's binlog event, but S2's  Slave_IO_Running is still 'ON'
4. shutdown master, run `tc qdisc del dev ens33 root` on S2, release lock on S1
5. see who will be the new master(In our tests, the orchestrator chose S2 as the New Master, But I think S1 should be chosen as the new master)

Fanduzi commented 3 years ago

@shlomi-noach I take the liberty of hoping that you will take the time to answer my question, as I don't know golang, so I don't know much about the failover logic of the orchestrator, so please forgive me if I'm wrong, and I look forward to your reply.

shlomi-noach commented 3 years ago

Whoops, sorry, missed this in the backlog.

Right, I think I saw another similar questio nrecently. What you tests show is:

orchestrator will promote the replica which has executed more events
rather than the replica which has more data in the relay logs

The systems I've worked with are such that replication lag is very low (by actively pushing back on apps). Therefore, at time of failover, it only takes a fraction of a second for any replica to consume whatever relay log events are in the queue.

Back to your question, could the following configuration help? "DelayMasterPromotionIfSQLThreadNotUpToDate": true. Off the top of my head, not sure -- this check is made after we've picked the promoted replica.

So, we need a mechanism that chooses a replica based on potential data, not on current data. This is only applicable for GTID based failovers, because you can only compare replicas in GTID topologies.

Let me look into this.

Fanduzi commented 3 years ago

maybe use Master_Log_File and Read_Master_Log_Pos?

Fanduzi commented 3 years ago

大佬, have you made any progress?

binwiederhier commented 2 years ago

You may be interested in this: https://datto.engineering/post/lossless-mysql-semi-sync-replication-and-automated-failover (disclaimer: I wrote it :-))

Fanduzi commented 2 years ago

binwiederhier

Thank you @binwiederhier , the article was very helpful I'm currently using MHA + ProxySQL + semi-sync(AFTER_SYNC) + GTID. Since MHA selects the latest slave based on ReadBinlogCoordinates, it seems that our current architecture is theoretically safe from data loss. Of course, I've also made some modifications to prevent split brain issues so that there is only one "master" in ProxySQL when failover occurs(because proxysql cluster is not a real cluster)

I was planning to replace MHA with Orchestrator this year, but I've found that Orchestrator's philosophy is different from MHA. Orchestrator tends to prioritise availability and retain the maximum number of replica in the cluster. Orchestrator uses ExecBinlogCoordinates to select candidate, which does have the potential for data loss in the extreme scenario I described. So I learned a bit about go, and made some "modifications" on May Day Holiday, which are still being tested.

However, in the process of learning the source code I found that there was something wrong with DelayMasterPromotionIfSQLThreadNotUpToDate, it didn't "work", according to the path I sorted out for the code call:

RegroupReplicasGTID-> GetCandidateReplica-> sortedReplicasDataCenterHint-> StopReplicas->StopReplicationNicely

StopReplicationNicely finally executes stop slave. I don't find anywhere in the code where the start slave sql_thread is executed afterwards. So DelayMasterPromotionIfSQLThreadNotUpToDate has been waiting for a stopped slave...

I'll have to look into it, your orchestrator.json is very informative for me, anyway, thanks~

ht2324 commented 10 months ago

@Fanduzi 无损半同步复制也会丢数据么在上面这个情况下？

Fanduzi commented 10 months ago

@Fanduzi 无损半同步复制也会丢数据么在上面这个情况下？

我测试的结果是会，你也可以测测

ht2324 commented 10 months ago

@Fanduzi 无损半同步复制也会丢数据么在上面这个情况下？

我测试的结果是会，你也可以测测

嗯应该是的从库没有收到完整的relay log，旧主会多出事务

openark / orchestrator

Can orchestrator + semi-sync guarantee zero data loss? #1312