omerxx / ecscale

A serverless applicaiton (Lambda function) that scales down (in) ECS clusters in a cost effective and graceful way.
MIT License
68 stars 17 forks source link

Ignore draining instances when checking ASG state #10

Closed cbankston closed 5 years ago

cbankston commented 6 years ago

DesiredCapacity does not correctly represent the current state of an ECS cluster, so I've replaced it with the clusters activeInstanceCount.

Some problems with DesiredCapacity are: 1) It includes instances that are in the Draining state, causing the ASG to appear bigger than it actually is. 2) When DesiredCapacity is updated it takes time to fulfil the request, causing the script to misunderstand the number of instances currently powering the ECS cluster.

cbankston commented 6 years ago

This should resolve some of the edge cases that could have caused #6 , however, there may be additional edge cases that could cause an issue.

For example, I am not sure how a clusters reservation metrics are affected when an instance is in the draining state, still running tasks, and the tasks have not yet been started on other hosts.

cbankston commented 6 years ago

I'll have to improve this after #7 is merged because of the new code that would be added.

oytuntez commented 5 years ago

Hey! Do you need any assistance to merge PR #6 and #7? I think we will simply take the code changes and apply them in our Lambda function. We are also using a smaller cluster.

cbankston commented 5 years ago

@oytuntez I've merged everything into the fork that my job is using if you want to see the full changes: https://github.com/id90t/ecscale

oytuntez commented 5 years ago

Thanks @cbankston! I also did the same over the weekend... :) https://github.com/oytuntez/ecscale

I have a couple more updates on the script, but I couldn't commit them yet.