Open tlake opened 6 years ago
From Slack
because of the nature of the retry, the "unable to assume role" error, for example, never gets returned to the outer retry.Retry
another issue is maybe in RetryWithTimeout, since that creates its own error to stop the loop, and wouldn't return the AWS err
One idea is to make it the caller’s responsibility to track errors:
var err error
fn := func() bool {
err = foo()
if err, ok := err.(*awserr.Error); ok && err.Code == "..." {
return true
}
return false
}
retry(fn) return err
that could work because it should retry again and `err = foo()` should make `err = nil` if `foo()` succeeds. If it retries or times out, `err` keeps the last error
@The logs below illustrate a failing smoketest due to being run in the wrong region, but they also highlight a more important problem: the retry logic (for services, at least) incorrectly swallows an AWS
UnsupportedFeatureException
error and proceeds with retry logic when it should instead be failing immediately and returning that error. In other words, we only avoid retry logic for a whitelist of errors, when we should only be entering retry logic for a whitelist of errors. Failing immediately should be the default, not retrying.Logs from service smoketests:
Logs from layer0 API: