superfly / flyctl

Command line tools for fly.io services
https://fly.io
Apache License 2.0
1.37k stars 226 forks source link

`flaps`: retry on certain failure modes, such as 504s #3601

Open alichay opened 1 month ago

alichay commented 1 month ago

Change Summary

What and Why: All flaps operations will retry if the error is known to be transient. This should improve reliability - especially in situations like fly deploy where one hiccup can (currently) stop an entire deploy.

How: Adds a wrapper in flapsutil over *flaps.Client that implements retrying on certain failure modes, like 504. NewClientWithOptions now returns the generic FlapsClient so that we can return the wrapper type instead of a raw *flaps.Client.


Documentation

rugwirobaker commented 1 month ago

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

alichay commented 1 month ago

Retrying POST/PATCH my yield unexpected results cause subsequent requests might to go to a different flaps and flyd among other issues. It's basically not Idempotent. https://flyio.discourse.team/t/flaps-what-status-codes-can-we-retry/5060/2

ah. that's a huge bummer. I'm going to try to see if I can get this working for GET requests then, and maybe afterwards come up with a different strategy for fly deploy machine creation.

I wonder if some operations, like SetMetadata, could be retried anyway? it's setting a named value, so even if it were called multiple times there shouldn't be any side effects...