Open oceanusxiv opened 1 week ago
Correct, there is no exponential backoff to the download retries (though it depends on your definition of exponential). However, I'm not sure that object_store
is the place to configure retry backoffs due to network outages. What sort of max duration are you looking for?
If you are looking for something over 5 minutes then you will encounter this warning from object_store
:
As requests are retried without renewing credentials or regenerating request payloads, this number should be kept below 5 minutes to avoid errors due to expired credentials and/or request payloads
If you are looking for something less than 5 minutes then you can probably get there by exposing with_retry
in some way. It should be a fairly straightforward change. Probably the simplest thing to expose would be init_backoff
. I'd advise anyone working on this to read up on the actual algorithm used which is "decorrelated jitter" and not "classic exponential growth". It is designed to avoid waves of concurrent requests and not solve network outages. Its growth is sub-linear.
We do have an outer retry loop that we use in most places which can be configured with (sadly not documented download_retry_count
but this only applies to the download of the data and not the initial transmission of headers).
So, if we want a retry loop for intermittent network timeouts it probably needs to be a new retry loop. I'd be open to the idea but also slightly cautious as this feels like something not all users will need and the users that do can build their own retry loop outside of Lance.
The underlying
object_store
crate being used supports setting awith_retry
configuration which is useful for exponential backoff and jitter when you have temporary network outages. It should be exposed to the user via thestorage_options
API (or some other API) so it can be set, as it stands I don't think there's any exponential backoff to the download retries?