ryanrupp commented 3 days ago

Description

This issue is pretty much the same as https://github.com/hashicorp/terraform/issues/6870 just for auto-scaling groups. I have a use case where I launch Terraform in parallel doing ~50+ auto-scale group creates where each one then has to poll for a successful launch - resulting in rate limiting. Primarily these are calls to DescribeAutoScalingGroups and I think it's just calling this per auto-scaling group every 10 seconds. I think these are the loops here for launch and then draining here.

I was trying to determine if maybe I can set a global default in Terraform which would be sufficient for me but I can't find this.

Affected Resource(s) and/or Data Source(s)

No response

Potential Terraform Configuration

No response

References

No response

Would you like to implement a fix?

Possibly, assuming the linked issue for Beanstalk polling is pretty much the same idea/approved approach.

github-actions[bot] commented 3 days ago

Community Note

Voting for Prioritization

Please vote on this issue by adding a 👍 reaction to the original post to help the community and maintainers prioritize this request.
Please see our prioritization guide for information on how we prioritize.
Please do not leave "+1" or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request.

Volunteering to Work on This Issue

If you are interested in working on this issue, please leave a comment.
If this would be your first contribution, please review the contribution guide.

justinretzolk commented 1 day ago

Hey @ryanrupp 👋 Thank you for taking the time to raise this! I'd like to leave this open for the team/community to review further and determine if there's a path for this enhancement. In the meantime, it may be worth looking at the parallelism flag. While the flag isn't generally meant to be used to control API rate limiting, it would allow Terraform to process fewer of the autoscaling groups at the same time, and may save you from hitting the rate limit. Since supplying that flag could be cumbersome in the long run, I'd be remiss if I didn't mention that breaking up the configuration into smaller parts may also help, but I assume you have reasons for why that isn't desirable.

ryanrupp commented 10 hours ago

@justinretzolk thanks for the response. We actually have a change to do these in batches essentially as a workaround but it prolongs this deployment process is all.

I can see how I could implement this in Terraform. The main unclear part to me is the configuration aspect of this i.e. I'm not sure if adding a poll_interval that gets applied to everywhere that uses StateChangeConf in group.go is appropriate - it's more specific probably i.e. only do this in waitGroupCapacitySatisfied and waitGroupDrained (and maybe a few others that I'm less familiar with like the warm pool concept) - basically anywhere waiting on the instances to start/stop. So maybe a more specific property like capacity_check_poll_interval which applies to both launch and drain.

Also draining is slow for us because the load balancer is setup with deregistration_delay of 300s to allow in-flight requests to complete (so basically polling for the first 5 minutes is never going to have the instances scaled down short of error/unexpected termination). Note this also behaves sort of weird in AWS afaik in that even if in-flight requests complete it still waits the whole 300s unfortunately, see here

If a deregistering target has no in-flight requests and no active connections, Elastic Load Balancing immediately completes the deregistration process, without waiting for the deregistration delay to elapse. However, even though target deregistration is complete, the status of the target is displayed as draining until the deregistration delay timeout expires. After the timeout expires, the target transitions to an unused state.

ideally it would just terminate the instance once no in flight requests were there as we do have times when these nodes are idle anyway - I believe classic load balancer behaved this way because I remember our deploys a long time ago not having to drain the full length of time like this.

hashicorp / terraform-provider-aws

[Enhancement]: Allow auto-scaling group check frequency to be decreased to avoid API rate limits when deploying many auto-scaling groups #40100

Description

Affected Resource(s) and/or Data Source(s)

Potential Terraform Configuration

References

Would you like to implement a fix?

Community Note