Open apollodafoni opened 2 days ago
/severity critical /component ddl /assign @tangenta /label affects-8.4
The error is here, found by search in "org" level in GitHub. I guess the reason is PD ScanRegion API does not have internal retry and caller forgets to handle it like
Same cause as https://github.com/tikv/pd/issues/8442
If the region information is loaded from the local disk and the current leader has not yet reported a heartbeat to PD, the region information scanned at this time will not include the leader.
The lighting has encountered similar issues before https://github.com/pingcap/tidb/pull/52822.
Need to add retry logic when no leader region information is returned.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
tiup.yaml is as follows:
After restore some data from S3, do large reorg:
Duiring sql execution, do tiup upgrade:
2. What did you expect to see? (Required)
large reorg sql execute success after upgrade
3. What did you see instead (Required)
It seems like ddl job not paused duiring upgrade!
sql execute failed:
Message: "receive Regions with no peer"
4. What is your TiDB version? (Required)
tiup upgrade tidb from v8.3.0 to v8.4.0-pre