populationgenomics / metamist

Sample level metadata system
MIT License
1 stars 1 forks source link

Longer backoffs between retries #739

Closed MattWellie closed 4 months ago

MattWellie commented 4 months ago

Issue

The backoff-retry decorator is working, but my original suggestion probably wasn't enough. See Batch driver image. Relevant logging:

2024-04-23 02:58:51 INFO (backoff 105): Backing off query(...) for 0.6s (gql.transport.exceptions.TransportServerError: 503 Server Error: Service Unavailable for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
2024-04-23 02:59:01 INFO (backoff 105): Backing off query(...) for 1.1s (gql.transport.exceptions.TransportServerError: 503 Server Error: Service Unavailable for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
2024-04-23 02:59:08 ERROR (backoff 120): Giving up query(...) after 3 tries (gql.transport.exceptions.TransportServerError: 503 Server Error: Service Unavailable for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)

This failing metamist query ran 3 times, stopping for barely a second in between attempts. On the plus side, the exception catch-backoff ran as intended, but from the backoff docstring:

max_tries: The maximum number of attempts to make before giving
            up. Once exhausted, the exception will be allowed to escape.
            The default value of None means there is no limit to the
            number of tries. If a callable is passed, it will be
            evaluated at runtime and its return value used.

I can't really square the behaviour I've seen (3 attempts, combining for < 2 seconds total wait) with the code as implemented which should be doing unlimited retries up to a 10 second gap. Unless the next interval after 1.1s was > 10s, which is not impossible... and would explain the gap.

This Change

This is just a suggestion PR, might have been better phrased as an Issue: the current backoff implementation is working, just not working well enough.

MattWellie commented 4 months ago

Updated to include that version change, now shoving a cheeky patch bump on top for this change

codecov-commenter commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 77.15%. Comparing base (5451b07) to head (9b1633d).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## dev #739 +/- ## ======================================= Coverage 77.15% 77.15% ======================================= Files 157 157 Lines 12945 12945 ======================================= Hits 9988 9988 Misses 2957 2957 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.