Open stswidwinski opened 1 year ago
Hi @stswidwinski!
First some background: note that there's a difference between the DesiredStatus
and the ClientStatus
(in the API, these are Desired
and Status
in the CLI headers). When the leader accepts a plan from a scheduler worker, it sets the DesiredStatus
. The client then pulls allocation updates from the servers and will set ClientStatus
once it has completed setting that status on the allocation.
For the case of shutdown_delay
, that should be after the shutdown delay occurs. And this does seem to match up with what you're seeing:
Allocations
ID Node ID Task Group Version Desired Status Created Modified
fa866f32 bad543f9 one_and_only 0 run running 33s ago 23s ago
a6885833 23a04c0f one_and_only 0 stop running 7m20s ago 4m53s ago
But you'd expect the drainer to respect the ClientStatus
. Unfortunately, it does not! This is reflected in known bugs #14293 and #12915 and #9902. I have a partial PR already up to fix this in https://github.com/hashicorp/nomad/pull/14348 but never got around to finishing it. Fortunately I've got some time carved out in the next few weeks (probably March) to focus on documenting, dianosing, and fixing some drain behaviors. I'll pick up this issue as part of that work and #14348 should do the job for you here.
This makes sense. Thank you! Looking forward to the patch :)
This issue is fixed by https://github.com/hashicorp/nomad/pull/14348, which will ship in the next regular patch release of Nomad.
I have just tested this against 1.5.5 and the bug as described still occurs in the same way. The repro remains the same, except now it's against 1.5.5
and not 1.4.3
which makes the logging output a little bit different.
@tgross, I think that your patch changes the handling of stopping allocations correctly in the case of non-blocked evaluations, but leave the blocked evaluation case in the old state. Do you mind taking another look?
Re-opening
Nomad version
Nomad v1.4.2 (039d70eeef5888164cad05e7ddfd8b6f8220923b)
However, this repros on
v1.4.3
as well.Operating system and Environment details
These do not matter. Unix/Linux.
Issue
When running a cluster which is running at capacity, a drain of a node which has
service
allocations running on it will create an evaluation which is Pending. This Pending evaluation will immediately be solved for if more capacity is added resulting in multiple allocations running for a single job, especially with large kill timeouts.Under normal circumstances we expect that the allocation which has been drained blocks the creation of any new allocation.
Reproduction steps
Let us begin with the local setup. We will want two clients and one server. The first server and client are created using the usual, boring setup. Please note however that we set the max kill timeout to something considerable, such as an hour:
The second client is set up analogously, but we cannot use
nomad agent -dev
as easily. To avoid port conflicts we do:After this setup we have two nodes with raw-exec enabled. Just as a sanity check:
Then, start a job:
Now, let us flip the node with no allocations to be unavailable. We want to simulate the situation in which we are running at full capacity:
The job continued to run just fine. Now, let us drain the node on which the job is currently running and inspect the state of allocations:
Now, let us make the node that had nothing running on it eligible again.
And to our surprise the job which should have just 1 allocation has... Two! Both running.
Expected Result
The behavior should be consistent with regular drain behavior in which we do not schedule additional allocations until the last allocation is in a terminal state.
Actual Result
We schedule extra allocations and ignore the state of the old ones.
The logs don't contain much insight into what happened.