Open lilinghai opened 1 year ago
This is by design as per @niubell , the error happens in checksum phase, and currently there is no retry mechanism during checksum. Change it to enhancement.
This is Rishabh. I work in airbnb. We have seen multiple times lightning failing during checksum phase. I understand that checksum is an expensive operation and retry can be costly. But we should retry for the errors like "region unavailable" or "PD can't fetch timeout". We already filed support tickets for these issues
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
/tidb-lightning \"-pd-urls\" \"tc-pd.e2e-htap-encryption-tps-1302571-1-900:2379\" \"-tidb-host\" \"tc-tidb.e2e-htap-encryption-tps-1302571-1-900\" \"-tidb-port\" \"4000\" \"-tidb-user\" \"root\" \"-tidb-password\" \"\" \"-backend\" \"local\" \"-sorted-kv-dir\" \"/tmp/sorted-kv-dir\" \"-d\" \"s3://nfs/tiflash/csv-tpcc-100?access-key=minioadmin&secret-access-key=minioadmin&endpoint=http%3a%2f%2fminio.pingcap.net%3a9000&force-path-style=true\" \"-c\" \"/lightning.yaml\""
2. What did you expect to see? (Required)
success
3. What did you see instead (Required)
4. What is your TiDB version? (Required)
master