Drainer should not retry on long time ddls directly

pingcap / tidb-binlog

A tool used to collect and merge tidb's binlog for real-time data backup and synchronization.

Apache License 2.0

292 stars 131 forks source link

Drainer should not retry on long time ddls directly #1169

Open lichunzhu opened 2 years ago

lichunzhu commented 2 years ago

What did you do?

Use Drainer to replicate some long time costing ddls to downstream TiDB. For example, adding index ddls.

What did you expect to see?

Drainer can replicate ddls successfully.

What did you see instead?

Drainer fails because of i/o timeout and keeps retrying to replicate these ddls again.

Versions of the cluster

master(https://github.com/pingcap/tidb-binlog/commit/b0214a29e9fa5810df95cccbf61a0090b3ba9775)

lichunzhu commented 2 years ago

Root Cause

When Drainer executes some time-costing ddls, especially for adding index, Drainer may fail to get result because it won't return any result until it succeeds. If this time cost is larger than syncer's read-timeout, Drainer will fail and try to execute ddl again. This will make the situation even worse.

Workaround

For these long time costing ddls, we can use a special connection to execute them and keep watching the result asynchronously through admin show ddl jobs.