Open eigengrau opened 5 years ago
The same use case could be implemented by a rolling update, but instead of removing an allocation first, before starting a new allocation, it could start a new allocation, wait for its health check to become successful, and only then stop an old allocation, and continue to do that until all old allocations are stopped.
This feature would be useful to us since we run some workloads for staging or development purposes with count = 1. While these jobs do not require high-availability, it would nonetheless be preferable if services were not disrupted by a node drain operation.
Same here. Ie we can deal with downtime in case of a crash but would like to be able to do maintenance operations without downtime. Note that the services in question could all be run HA, but there is generally no need to waste the CPU/RAM by setting the count=2
I was ready to file the save issue, but fortunately, I found this one. Absolutely must-have feature for draining nodes that have single-instance jobs. I don't see any reason why it is not already implemented. While we have an ability to update jobs seamlessly, the migration task looks no different.
Hello people! We just got hit by this same behavior. We have jobs that generally should only have 1 allocation running at any given time, although we support having 2 of them running for short periods of time while doing canary updates.
Same as mentioned before, it would be very nice if migration worked like a canary update with auto promotion to avoid downtime.
Is there anything we can do to help moving this forward?
It would be convenient if migrations (e.g., triggered by
nomad node drain
and governed by themigrate
stanza) were able to create canary allocations. These allocations would be created before old allocations are destroyed, and old allocations would not be stopped until replacement allocations are deemed healthy. Unlike canaries, these replacements would probably not require manual promotion, but would instead promote automatically when healthy.This feature would be useful to us since we run some workloads for staging or development purposes with
count = 1
. While these jobs do not require high-availability, it would nonetheless be preferable if services were not disrupted by a node drain operation. Since some of the applications we run require a 30-40 minute initialization time (I know, right) before service becomes available, the impact of a node drain in these cases can become a nuisance.