pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
416 stars 274 forks source link

jobMaster: should reset error_message when we restart job_master #9652

Open lichunzhu opened 10 months ago

lichunzhu commented 10 months ago

What did you do?

Use engine to replicate some data. Engine met '[DFLOW:ErrWorkerSuicide]worker has committed suicide due to master(dataflow-engine-job-manager) having timed out' and keeps reporting failing message.

What did you expect to see?

DFLOW can clear this message after jobMaster works well.

What did you see instead?

This message can't be cleared unless we update them in metadata;

Versions of the cluster

Dataflow Engine version (run tiflow version): master

(paste Dataflow Engine version here)
lichunzhu commented 10 months ago
image

Guess it's caused by persistMetaError only persist meta info while error is not nil.

image