We've ran into a surprising behaviour where backoff didn't retry a failed Future even though the error was clearly mapped to backoff::Error::Transient.
After some digging, we discovered that the default value of max_elapsed_time is 15 minutes and in our case, the Future only failed several hours in. For context, our Future first establishes a websocket connection and then reads values from it in a loop. This connection can fail at any time in which case we would like to re-establish the connection (but only if it failed for certain reasons), hence our use of backoff.
I would like to suggest to either remove max_elapsed_time entirely or at least set it to None by default. It is very surprising behaviour that the total runtime of a Future influences whether or not it will actually be retried once it completes with an error.
We've ran into a surprising behaviour where
backoff
didn't retry a failedFuture
even though the error was clearly mapped tobackoff::Error::Transient
.After some digging, we discovered that the default value of
max_elapsed_time
is 15 minutes and in our case, theFuture
only failed several hours in. For context, ourFuture
first establishes a websocket connection and then reads values from it in a loop. This connection can fail at any time in which case we would like to re-establish the connection (but only if it failed for certain reasons), hence our use ofbackoff
.I would like to suggest to either remove
max_elapsed_time
entirely or at least set it toNone
by default. It is very surprising behaviour that the total runtime of aFuture
influences whether or not it will actually be retried once it completes with an error.Thoughts?