Open kennytm opened 1 month ago
The reason GCS did not retry is because the upload is not considered an "idempotent operation". The default retry policy is RetryIdempotent, which said:
RetryIdempotent causes only idempotent operations to be retried when the service returns a transient error. Using this policy, fully idempotent operations (such as
ObjectHandle.Attrs()
) will always be retried. Conditionally idempotent operations (for exampleObjectHandle.Update()
) will be retried only if the necessary conditions have been supplied (in the case ofObjectHandle.Update()
this would mean supplying aConditions.MetagenerationMatch
condition is required).
Patching the GCS to use RetryAlways
will make Dumpling perform like the expected behavior
diff --git a/br/pkg/storage/gcs.go b/br/pkg/storage/gcs.go
index 0f1d8a2418..b4893657ba 100644
--- a/br/pkg/storage/gcs.go
+++ b/br/pkg/storage/gcs.go
@@ -432,7 +432,7 @@ func (s *GCSStorage) Reset(ctx context.Context) error {
if err != nil {
return errors.Trace(err)
}
- client.SetRetry(storage.WithErrorFunc(shouldRetry))
+ client.SetRetry(storage.WithErrorFunc(shouldRetry), storage.WithPolicy(storage.RetryAlways))
s.clients[i] = client
return nil
})
but I'm not sure if we should use this easy solution which "can lead to race conditions and other conflicts", or properly making it idempotent by supplying the ifMetagenerationMatch
/ifGenerationMatch
preconditions.
@Benjamin2037 will your team fix the problem?
@BornChanger OK
I can take a look at this, but please let me know if it is urgent.
@OliverS929 We have asked customer tried using the XML API (s3://
) to see if it can workaround the issue. It is currently 23:50 in SF, I don't have the urgency assessment right now.
The customer used local disks as a workaround. But as it's such a common scenario on GCP, it would also be helpful if this could be fixed.
Bug Report
Please answer these questions before submitting your issue. Thanks!
1. Minimal reproduce step (Required)
(I used mitmproxy to inject the 503 error. For testing there should be some zero-external-dependency means to do so :thinking: Also I used a local fake-gcs-server serving HTTP to avoid distractions of installing the self-signed TLS CA.)
Get mitmproxy
Prepare the following script, which will inject 503 for the first two requests to
*/o
(the URL for uploading objects to GCS)Run mitmproxy loaded with this script
Patch dumpling to use this proxy:
Run dumpling.
2. What did you expect to see? (Required)
Given that we only inject 503 twice, Dumpling should be able to successfully upload the file on its 3rd try and the whole process succeed.
Inside the mitmproxy console, we should be able to see two 503 responses like
3. What did you see instead (Required)
Dumpling failed without any retry, with logs like
4. What is your TiDB version? (Required)
Dumpling v8.3.0