hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.92k stars 1.95k forks source link

unexpected rescheduling behavior for system type job #11825

Open fredwangwang opened 2 years ago

fredwangwang commented 2 years ago

Nomad version

v1.1.5

Operating system and Environment details

Linux & Windows

Issue

When stopping system type allocations from the UI, nomad will reschedule the allocations up to n-1 allocatable counts. This is unexpected:

  1. as mentioned in the documentation, system type does not have rescheduling: https://www.nomadproject.io/docs/schedulers#system
  2. if rescheduling is happening, it should reschdule to the point of all n allocations, not n-1

Reproduction steps

  1. deploy a system type job
  2. stop all allocations

Expected Result

Either: no allocations would be running (accroding to the doc) Or: all allocations are rescheduled

Actual Result

n-1 allocations are reshecudled

Screen Shot 2022-01-11 at 4 25 55 PM Screen Shot 2022-01-11 at 4 26 49 PM

Job file (if appropriate)

https://gist.github.com/fredwangwang/9483fb3c2495b2b38c4b9b038e2132db

tgross commented 2 years ago

@fredwangwang when you say "stop all allocations" you mean you're doing nomad alloc stop on all the allocations individually? (And not nomad job stop?) I suspect the wording of the documentation is misleading; we don't let you configure rescheduling because the evaluation that's triggered by an allocation stop sees that an eligible node is missing a system allocation and replaces it.

However, I would also expect that all the allocations that were stopped got replaced. Is there any chance that the node that's missing its allocation has changed such that it's no longer eligible for scheduling? It'd be worth checking nomad eval status -json to look at the list of evaluations that were triggered to see if you can find out what happened with that node.

fredwangwang commented 2 years ago

doing nomad alloc stop

image this is what I did ^, I suppose this is the same as running alloc stop.

Is there any chance that the node that's missing its allocation has changed such that it's no longer eligible for scheduling

Nothing changes besides stop the allocations. In fact after observing the n-1 allocations, I manually triggered nomad job eval <job-name> and then it is able to place the missing allocation.

tgross commented 2 years ago

Ok, that's clear (if weird). We'll look into it.

I do have to ask, why stop all the allocations like this?

fredwangwang commented 2 years ago

ctx: initially it was a single allocation malfunciton, so I have to stop it (and expecting it to reschedule same as service types). But that never happen. And by chance I found out interestingly nomad always keeps n-1 system allocations no matter how many allocates I stop.. Stopping all allocation is just to make a point clear for this story, but not any real use cases :D