Open mrzor opened 4 years ago
resource timeouts are definitely a good idea and not too difficult to implement. retry also sounds like a good idea, except i would probably add a max_retries attribute or something
Thanks for the speedy answer :)
max_retries
seem sensible indeed as a failsafe. I'm not sure how the retry logic intersects with timeouts. IF timeouts are counted against the whole operation - retries included - then max_retries
doesn't add much to safety ; just to expressiveness. I'm not sure the corner cases are worth it. On the other hand, IF timeouts are per-attempt, then max_retries
seems pretty must-have for safety.
In the context of retrying, passing some additional context through the environment could help the script make a better decision on "should we keep trying or not":
TF_SHELL_RETRY_ATTEMPT
: 1 means first attempt, then 2, 3, 4. Conveniently this would allow to implement some flavor of exponential backoff script-side if desired. Also enables script to enforce max_retries
if so desired (but provides less safety than handling that aspect in the provider).TF_SHELL_RETRY_FIRST_ATTEMPT_TS
: unix timestamp of first attempt. Enables script to give up before timeout if so desired. Convenient to reconcile multiple attempts in logs (barring some kind of TF_SHELL_INVOCATION_ID
that isn't there either)What's the actual behavior with timeout, is the go code hardcoded to timeout after 1 minute?
Greetings!
I'm facing an annoying race condition upon creating my Google Compute Engine project. Even though I activate the compute APIs and the dependency from the shell resource is clearly stated, it requires some extra time to propagate. I'm going to add a sleep as a workaround, but that's unsatisfactory for two reasons : first, it introduces extra delay (could be ready before
sleep
ends), second, it might just not be enough (could be still pending by the timesleep
ends).Resource > Operation timeouts indicates that some resources support custom timeouts. I'm not sure what the defaults are - but it's certainly something I'm interested in tweaking when adding
sleep
to my script.SDK Docs > Resources > Retry indicates that some better support is available to providers for eventually consistent resources. I would love to be able to indicate that exit code N of my script means retry after M seconds, like so:
Such retry logic would allow me to speed things up at the expense of extra API calls. It would also make my config tolerant of edge cases where the propagation delay might just be greater than my hardcoded sleep workaround. I suppose that other eventually-consistent resources would benefit in just the same way.
Thanks for the great plugin, it's a life saver and a better alternative to local-exec when I need state, and a much better alternative to external data source because it .. keeps state. Good work!