Open dpogorzelski opened 3 years ago
Hi @dpogorzelski , that's a cool use case. The current target-value
plugin doesn't support that, but it seems like a simple concept to support an additional capacity as a parameter. We'll keep this open for the roadmap, but interested parties should feel free to submit a PR.
Hello there, I was thinking about implementation of that concept and a couple things appeared in my mind which might make it not so simple as it look. Like after we added additional spare node to delta capacity, should we exclude this spare number of nodes from APM calculation query, how we could do that and will it work with other APMs like Prometheus for example. If I overcomplicate things could you please point me in right direction @cgbaker so I could try to implement and contribute changes from our side.
Hey, I'm quite keen on reviving this issue. Have there been any developments in this area at all? If not, could anyone a bit more experienced with Nomad and the autoscaler give some pointers on where this should be implemented? I will have a crack at it as we need it, but I'd love to get some ideas of where to start. :)
Like Artem said above, there's a few places which we could put it but it will probably cause issues I can't currently foresee.
Hi @leosunmo and @artemantipov 👋
Thank you for the interest in taking on this, and apologies for the delay in getting back to you.
So I think this could be implemented in the runTargetScale
function. There's quite a bit going on, but the BaseWorker
is the component responsible for evaluating an expression, which means reading the target status, querying the APM for metrics, running the policy strategy, and, finally, applying the scaling action onto the target.
This final step is done by the runTargetScale
function, and I think we could just add the Y
@dpogorzelski mentioned. This value could be set in and read from the policy as as new configuration value. There's quite a bit of plumbing that needs to happen to get a new config from file all the way to the worker, but maybe https://github.com/hashicorp/nomad-autoscaler/pull/567 can serve as a guide.
Here are some diagrams that sketched for an internal document, they may be handy to understand how everything fits together:
Feel free to reach out if you have any more questions 🙂
After answering this I came across https://github.com/hashicorp/nomad-autoscaler/issues/577 and made a change that would impact this work slightly.
Instead of changing runTargetScale
as I mentioned before, you would apply the standby units in the new scaleTarget
function.
One important consideration is how to apply the min
and max
interval. Would N+Y
always be enforced to be within these limits, or would just N
?
I think we would always want to be within [min:max]
, so this check needs to be included in this work, but I would be curious to hear your thoughts as well.
Hey everyone :) Is there an official way of to define a scaling policy (horizontal scaling) in such way that the amount of nodes, at any given time, is N+Z? Where N is the actual number of needed nodes based on APM data and Y is the safety buffer to cover for unexpected spikes. Y in this case is a fixed number like 1, so the autoscaler would always make sure the number of nodes is N+1 for example. Other buffering strategies might also be possible/desirable. Thanks :)