Open skuda opened 7 years ago
Hi @skuda,
This is the best place to discuss this :) Keeping deallocated node is not a bad idea, but I think it will be quite complex to implement correctly. What would be an acceptable scaling time in your case?
We would like to see something like that as well - reduce the starvation time. I was thinking of a different approach, just keep extra nodes alive. For exmaple, if I will configure extra-nodes=2
. the autoscaler will always keep extra cores and memory that match the resources of two nodes.
@wbuchwalter For me, 2 or even 3 minutes would be fast enough.
I'm not sure this is the fault of the autoscaler itself but rather a function of how long it takes for Azure to spin up a VM for your cluster. In my general tests it takes anywhere from 7-13 minutes to get a new VM in an Availability Set. This is in westus by the way. I wonder if the vm scale up time differs based on region?
I agree with @skuda that a shorter vm time of a few minutes would be ideal. I personally don't think this is possible without using VM Scale Sets. Maybe when those are supported by acs-engine we will get the shorter spin up time.
Or... when we are able to use a stable ACI connector, or some other method of having a virtual kubelet with infinite capacity (serverless containers), then there shouldn't be any VM spin up time anymore.
Hi,
This is not a bug, sorry I didn't find a better way to communicate this!
I have been using the autoscaler and it's working great, but it's somewhat slow, for us adding new nodes is taking approximately 10 minutes.
Maybe our use case is a bit special but we have usually very small load that sometimes go up very fast, the specific service I am speaking about acts as a precomputed cache. If the cache is full, hits are very cheap, if the cache is purged, something that happens 2 or 3 times per week, the load skyrocket for about 2 to 3 hours.
I understand that creating the nodes, installing everything and adding them to the cluster is something that takes its time, but I have been thinking, why not having a specific number of pre-configured nodes, only deallocated, it would be much faster to just put online existing servers that destroy them and recreate them from the very beginning every time, no?
Best, Miguel.