Job scaling should take additional params

mr-karan commented 1 year ago

Proposal

nomad job scale should be a bit more configurable - like taking a param to stop the old allocs first. Or maybe that could be the default behaviour. Currently, it destroys the latest allocs.

Use-cases

I'd like to use it for applications which need some updated files (fetched externally via S3 etc) and restart. However, to not stop the incoming traffic on the existing allocs, I can't use nomad alloc restart as it immediately kills the alloc. There's a shutdown_delay as well, but that isn't useful in this case since:

the new alloc stays in pending state till the old alloc stops
nomad doesn't de-register the service from the catalog until the service fully stops. So effectively, shutdown_delay doesn't have any effect if using service.provider="nomad".

Attempted Solutions

Couldn't find a workaround for this.

jrasell commented 1 year ago

Hi @mr-karan and thanks for this proposal. I can see the value in this, although we would certainly need to spend time discussing and designing any implementation as I believe this would require changes to Nomad's core scheduling algorithms and reconciliation logic.

wjnicholson commented 1 year ago

Hi - we have a related issue with some distributed computations we are running.

We want to scale job counts based on how any calculations can be executed in parallel, and there is some benefit to choosing which instances of our applications to shut down based on current cached data / running calculations, as each instance can run multiple long-lived calculations in parallel. Currently we spawn multiple job copies and are able to control which jobs (and hence allocations) are requested to stop by either not assigning them more work and our application deciding to shut down, or by calling nomad API endpoints to terminate the job, and then requesting new jobs later if necessary. However we lose the benefits around nomads workload migration & node draining by having to specify jobs individually, and produce more load on the nomad servers (as per details given in https://github.com/hashicorp/nomad/issues/13933). We would like to migrate from requesting X copies of these homogenous jobs to having a single job with a count of X.

If we had something similar to AWS ASG's "detach" functionality, where we could say "stop these allocs & decrement count when they exit", then we would be able to gracefully shrink the size of our task group with the least impact to our workload. Right now allocations and job scaling are different endpoints, so while we can request to stop an allocation & scale down, we don't know what order they will be done in & could have spurious evaluations. Ideally we'd like to be able to calculate based on some metric / fit parameter that we could export from each job, that could then be ranked in order to find which ones would be best to stop, similar to what the nomad autoscaler does for nomad agent instances. Then we could treat them more like cattle than pets, and stop managing their lifecycle directly.

hashicorp / nomad