Open lichunzhu opened 2 years ago
seems there're multiple relay writers
[2022-04-21T06:45:59.162Z] [2022/04/21 14:45:09.643 +08:00] [DEBUG] [relay.go:662] ["writing binlog event"] [component="relay log"] [header="{\"Timestamp\":1650523509,\"EventType\":19,\"ServerID\":1,\"EventSize\":58,\"LogPos\":173889,\"Flags\":0}"]
[2022-04-21T06:45:59.162Z] [2022/04/21 14:45:09.643 +08:00] [DEBUG] [relay.go:662] ["writing binlog event"] [component="relay log"] [header="{\"Timestamp\":1650523509,\"EventType\":19,\"ServerID\":1,\"EventSize\":58,\"LogPos\":173889,\"Flags\":0}"]
so the relay log is corrupted
/assign lichunzhu
After the check, it's caused by our current test mechanism. We use check_point_offline
to make sure dm-worker exits. However, check_process_exit
is more accurate because dm-worker.test
needs some to exit after its port is offline.
Some tests are revised and check_process_exit
shell is added to avoid this kind of problem.
What did you do?
Run a dm-worker and randomly kill it and restart dm-worker.
What did you expect to see?
dm-worker can handle binlog events correctly.
What did you see instead?
https://ci2.pingcap.net/job/dm_ghpr_integration_test/5086/display/redirect dm-worker local-reader reports an error and can't continue working.
Versions of the cluster
master
current status of DM cluster (execute
query-status <task-name>
in dmctl)