allocations stuck during node drains from the GUI

matthiasschoger commented 1 year ago

Nomad version

Output from nomad version Nomad v1.5.6

Operating system and Environment details

Ubuntu Server 22.10 Linux compute2 5.19.0-45-generic #46-Ubuntu SMP PREEMPT_DYNAMIC Wed Jun 7 09:08:58 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

All the latest patches are installed.

Issue

In #12324, the following was implemented:

When a node is drained, system jobs are left until last so that operators can rely on things like log shippers running even as their applications are getting drained off. Include CSI plugins in this set so that Controller plugins deployed as services can be handled as gracefully as Node plugins that are running as system jobs.

I'm currently running a 3-node cluster, with the tasks using the NFS CSI plugin to mount storage from my local NAS.

When draining a node from the UI with both "Force Drain" and "Drain System Jobs" enabled, some jobs consistently get stuck when doing a node drain from the UI.

Issue #12324 seems to address the issue for the command line, but it seems like the issue still exists for a UI drain.

One more observation: Job migration works flawlessly when I just migrate the service jobs, not system.

Reproduction steps

Set up a 3-node Nomad cluster
Deploy the NFS CSI controller and plugin as system jobs.
Deploy a job which uses the NFS CSI to mount a storage volume (Gitea and Bookstack consistenty fail for me).
Drain the node (both services and system jobs) which contains the service job mounting the NFS share via CSI

Expected Result

The service jobs migrate away from the drained node.

Actual Result

Service jobs get stuck on the node. Docker fails to stop the containers.

Job file (if appropriate)

Happy to provide logs and job files, but I'm sure you have the relevant info from issue #12324.

Nomad Server logs (if appropriate)

Nomad Client logs (if appropriate)

tgross commented 1 year ago

Hi @matthiasschoger! The work done in #12324 should be agnostic to the method used to set the drain mode; it doesn't matter at all whether the API request comes from the CLI or web UI as far as I know so long as all the same options are used.

Can you provide more information about what's happening with the service jobs that aren't being drained?

matthiasschoger commented 1 year ago

Hi @tgross, thanks for the prompt reply. Actually, maybe I misused the web UI and this is more a documentation topic.

As you can see from my post, I checked the "Force Drain" toggle in both cases. Could that be the reason why my jobs get stuck during drain (using a CSI driver)? In that case, I think a warning on the tooltip would be nice that checking the "Force Drain" option can result in stuck jobs when using CSI drivers.

Otherwise, I'd be happy to provide logs for the issue, it's quite easy to reproduce. The jobs (docker plugin) are getting stuck and a reboot of the machine is from my experience the only way to get rid of them.

tgross commented 1 year ago

Hi @matthiasschoger unfortunately I'm having a little trouble following what the issue is here. Is the problem that you're seeing a difference between the UI and CLI (as initially reported) or is the problem that -force is forcing all your allocations to immediately stop regardless of type? Because if it's the latter, that's working as intended. I can certainly make that a little more clear in the drain docs though. (done in https://github.com/hashicorp/nomad/pull/17703)

Meanwhile, I checked the behavior of the UI it does look like there's a subtle difference between the API request bodies between the UI and the CLI, specifically with the Deadline) field of the Node Drain API.

From the CLI, with -force, the deadline is -1ns (which means we ignore the migrate block and immediately stop all the containers).
From the UI, with Force Drain, the deadline is -1ns.
From the CLI, without -force, the deadline is 1h.
From the UI, without Force Drain and without a Deadline set, the deadline is 0s (no deadline!). I think that's intentional given that there's a deadline toggle though.

matthiasschoger commented 1 year ago

Hi @tgross, it seems to be a documentation issue around the interaction of -force and CSI plugins shutting down before the jobs that are using the CSI.

Thank you for looking into it and resolving it quickly.

hashicorp / nomad