pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
428 stars 286 forks source link

dm-master may have two leaders operating etcd keys at the same time #3727

Open lichunzhu opened 3 years ago

lichunzhu commented 3 years ago

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do? If possible, provide a recipe for reproducing the error. When one dm-master resigns from etcd leader, the signal sent to this dm-master may arrive later than the other dm-master received the info to become etcd leader. This will lead two leaders to operate etcd keys at the same time. This may cause some problems in dm.

  2. What did you expect to see? There is always only one leader operating the etcd keys.

  3. Possible solution: For every etcd write in dm, we can add an if condition to check whether this etcd key equals to etcd leader key before each write transaction.

lance6716 commented 2 years ago

an user case:

master somewhat get panicked, and it was restarted by systemd. after restart, due to https://github.com/pingcap/ticdc/issues/3828 , it was retired when Optimist.Start. so there're two nodes writing etcd: itself and the new leader

fubinzh commented 1 year ago

/severity minor