pingcap / tiflow

This repo maintains DM (a data migration platform) and TiCDC (change data capture for TiDB)
Apache License 2.0
421 stars 280 forks source link

bug(tiflow): occasional timeout for GetJobDetail when dm source is unreachable #7673

Open maxshuang opened 1 year ago

maxshuang commented 1 year ago

What did you do?

Run a dm job in tiflow. When the job is running, remove the source white list and simulate the gone-away source scene.

What did you expect to see?

the dm job reports error but the GetJob request is normal.

What did you see instead?

From frontend: img_v2_065b0e4f-84d3-427d-878c-cb68c35f648g

GetJob request is timeout occasionally when the source is unreachable.

[DFLOW:ErrJobManagerGetJobDetailFail]failed to get job detail from job master: 
Get \\\"http://dm-job-30268-tiflow-executor-0.dm-job-30268-tiflow-executor-peer.tiflow-nightly-ms-73813-eks-us-west-2-
eda0d83e.svc:10241/api/v1/jobs/30268/status\\\": context deadline exceeded (Client.Timeout exceeded while awaiting headers

Versions of the cluster

Dataflow Engine version (run tiflow version):

3d22bf46712e2df25963a95842f645ef6745f45e
sleepymole commented 1 year ago

cc @lance6716 @GMHDBJD DM's status API should respond ASAP when the source is unreachable.