sideshowbandana / k8s-sqs-autoscaler

Kubernetes pod autoscaler based on queue size in AWS SQS
71 stars 27 forks source link

do not scale all the way down to zero unless there are no invisible m… #6

Open danmaas opened 6 years ago

danmaas commented 6 years ago

I'm working on a system where the minimum number of pods is zero, i.e. I want to completely auto-scale down to nothing if the queue goes empty.

The detailed mechanics of this are going to depend on the precise pod settings, but in general I think it might be a good idea to prevent scaling down the final pod when SQS reports that at least one message is still in an invisible/in-flight state.

This patch adds a special case for the 1->0 scale-down, where it checks the number of in-flight messages and aborts the scale-down if any exist.

(I'm less sure of this one than the other pull requests I sent today. Feel free to reject if this behavior doesn't make sense).

On second thought, a better criterion might be "don't scale down if number of replicas, after scale-down, would be less than the number of in-flight messages." -?

stevenpall commented 5 years ago

@danmaas I'm working on a similar system and was wondering how you were thinking of getting around the issue of scaling up from zero if the number of messages in the queue is less than the SCALE_UP_MESSAGES value configured? In the current implementation, SCALE_UP_MESSAGES should effectively determine the number of messages a pod can handle at any given time, but if you want to have a single pod spin up at a lower threshold, this is not currently possible. Thus, you either set the number lower than necessary and have extra pods or set it higher and potentially not have a pod come up when needed. I wonder if it would make sense to add a special case for when the number of pods is zero to create a pod if the queue depth is greater than 0...

danmaas commented 5 years ago

Good point @stevenpall . I think I missed the 0->1 scale-up issue because my setup always has SCALE_UP_MESSAGES=1.

Seems like a simple fix might be to adjust the scale-up logic if message_count >= self.options.scale_up_messages: to something like (untested): if message_count >= self.options.scale_up_messages or (message_count > 0 and deployment.spec.replicas < 1):

Let me know if this works for you and I'll update the PR? Although, it seems like the maintainer isn't responding here these days.