kothar / go-backblaze

A golang client for Backblaze's B2 storage
MIT License
94 stars 31 forks source link

retry failed calls more than twice. #27

Open antihax opened 6 years ago

antihax commented 6 years ago

Was receiving a lot of failures every now and then from get_upload_url. Should probably retry more than once on failure of these calls.

kothar commented 5 years ago

Looking at the documentation, I think we need to add a backoff when retrying:

503 errors from any API except b2_upload_file or b2_upload_part. (Handling a 5xx error on an upload API is described in the Uploading Files section above.) When this error occurs, it may include a "Retry-After" header where the value is the number of seconds a developer should wait before re-issuing the command. If the header is not present, the developer should retry using an expotential backoff starting with 1 second. These status codes may be returned from any B2 API.

DarkArc commented 4 months ago

So Kopia uses this package for B2 integration: https://github.com/kopia/kopia/blob/master/repo/blob/b2/b2_storage.go#L13

There's an issue affecting Kopia: https://github.com/kopia/kopia/issues/3472

This issue seems to occur as 503 errors are encountered. This is despite Kopia having its own retry system (see the internalRetry function, which is used indirectly through the retryStorage wrapper, which is always used by the B2 implementation).

See also: https://www.backblaze.com/blog/b2-503-500-server-error/

The bottom line is an error in the 500 block should be interpreted by the client as the signal to GO BACK to the dispatching server and ask for a new vault for uploads.

I haven't investigated enough, but I suspect simply repeating the same API call is insufficient. I suspect when this error is encountered the uploadAuthPool (https://github.com/kothar/go-backblaze/blob/master/buckets.go#L190-L221) needs reset so that the retry actually retries against a workable path.

This might explain why Kopia's retry system is failing to recover from this even with exponential back off pattern (in seconds):