mitodl / ol-infrastructure

Infrastructure automation code for use by MIT Open Learning
BSD 3-Clause "New" or "Revised" License
43 stars 4 forks source link

edx autoscaling groups #670

Open Ardiea opened 2 years ago

Ardiea commented 2 years ago

Revisit the Autoscaling for edx. Previously, when autoscaling was configured, edx would go into a death sprial of up/down/up/down over and over. Still need autoscaling but it needs to be smarter than the previous implementation.

Sar has load testing code somewhere that we can use to give this a nudge and hopefully trigger the autoscaling.

blarghmatey commented 2 years ago

Our first attempt at autoscaling for edx-platform instances was based on metrics collected from the load balancer. This works in that the rules were triggered to launch new instances, but the ASG got into a state where the uptime was flapping due to fluctuations in the number of instances and their overall readiness.

blarghmatey commented 2 years ago

@shaidar when you get a minute can you add any context about the failure mode that we ran into with MITx Online when the ASG scale-up policy triggered?