Closed atkinsonm closed 6 years ago
AWS Support confirmed our evidence that DescribeStackEvents calls are being throttled. This project does not specify retry options, so it is using the defaults. The maxRetries is set by CloudFormation and the retryDelayOptions are using the default. Some options:
With the current settings, the retries will be delayed to 100ms, 200ms, 400ms, and 800ms, after which point the command will fail. For CloudFormation stack events, I believe waiting up to 5-10 seconds may be acceptable since many resources will take longer than that time to complete, during which time the stack events log will remain the same. Therefore I believe the simplest change would be to configure AWS with a new base exponential backoff value.
This sounds great. Would you be able to submit a PullRequest?
@hoegertn PR submitted 👍
released as 1.4.0
@hoegertn, could you please reopen this issue? As my CI pipelines do more and more simultaneously, I am finding that this has crept up again. Some ideas on resolution:
retryDelayOptions
base backoff milliseconds with the logic that CloudFormation stack events will generally be updated infrequently.--retry-ms
as an optional command line argument to allow a developer to supply their own backoff time delay in case they have an application known to trip the API throttling triggers. I like this option better since it is unopinionated and would still allow us to set a default value.
Occasionally when performing an update, cfn-tail will run into API throttling with its CloudFormation DescribeStackEvents calls. This happens most often when deploying multiple projects at the same time. The current behavior is that cfn-tail terminates with a failure message. We want to be able to recover from this issue so we can continue to receive output of the stack events until the stack create/update is complete and also to reduce noise from failed CI jobs.