Open fredwangwang opened 2 years ago
@fredwangwang when you say "stop all allocations" you mean you're doing nomad alloc stop
on all the allocations individually? (And not nomad job stop
?) I suspect the wording of the documentation is misleading; we don't let you configure rescheduling because the evaluation that's triggered by an allocation stop sees that an eligible node is missing a system allocation and replaces it.
However, I would also expect that all the allocations that were stopped got replaced. Is there any chance that the node that's missing its allocation has changed such that it's no longer eligible for scheduling? It'd be worth checking nomad eval status -json
to look at the list of evaluations that were triggered to see if you can find out what happened with that node.
doing nomad alloc stop
this is what I did ^, I suppose this is the same as running alloc stop
.
Is there any chance that the node that's missing its allocation has changed such that it's no longer eligible for scheduling
Nothing changes besides stop the allocations. In fact after observing the n-1
allocations, I manually triggered nomad job eval <job-name>
and then it is able to place the missing allocation.
Ok, that's clear (if weird). We'll look into it.
I do have to ask, why stop all the allocations like this?
ctx: initially it was a single allocation malfunciton, so I have to stop it (and expecting it to reschedule same as service types). But that never happen. And by chance I found out interestingly nomad always keeps n-1
system allocations no matter how many allocates I stop..
Stopping all allocation is just to make a point clear for this story, but not any real use cases :D
Nomad version
v1.1.5
Operating system and Environment details
Linux & Windows
Issue
When stopping system type allocations from the UI, nomad will reschedule the allocations up to
n-1
allocatable counts. This is unexpected:n-1
Reproduction steps
Expected Result
Either: no allocations would be running (accroding to the doc) Or: all allocations are rescheduled
Actual Result
n-1
allocations are reshecudledJob file (if appropriate)
https://gist.github.com/fredwangwang/9483fb3c2495b2b38c4b9b038e2132db