Nomad jobs remain running after the node is stopped

akamensky commented 1 year ago

Nomad version

Nomad v1.5.5
BuildDate 2023-05-05T12:50:14Z
Revision 3d63bc62b35cbe3f79cdd245d50b61f130ee1a79

Operating system and Environment details

Fedora 36 Server Nomad deployed using YUM repository.

Issue

Nomad job/task keeps running after the Nomad service has been stopped

Reproduction steps

Have a Nomad cluster running (multiple nodes)
Deploy a simple job using exec driver with command = "sleep" and args = ["infinity"]
Locate a node where the job was assigned and do systemctl stop nomad on that node

Expected Result

Nomad exited and killed all the running jobs

Actual Result

Nomad exited
Allocated job keeps running (nomad executor process still running)

Job file (if appropriate)

job "hello-world" {
  datacenters = ["*"]
  group "servers" {
    count = 1
    task "web" {
      driver = "exec"
      config {
        command = "sleep"
        args    = ["infinity"]
      }
    }
  }
}

Screenshots:

Before systemctl stop nomad: Screenshot from 2023-05-10 11-12-56

After systemctl stop nomad: Screenshot from 2023-05-10 11-13-28

Nomad logs

May 10 09:13:02 hostname-redacted systemd[1]: Stopping nomad.service - Nomad...
May 10 09:13:02 hostname-redacted nomad[20251]: ==> Caught signal: interrupt
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.180+0800 [INFO]  agent: requesting shutdown
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.181+0800 [INFO]  client: shutting down
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.182+0800 [INFO]  client.plugin: shutting down plugin manager: plugin-type=device
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.182+0800 [INFO]  client.plugin: plugin manager finished: plugin-type=device
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.182+0800 [INFO]  client.plugin: shutting down plugin manager: plugin-type=driver
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.189+0800 [INFO]  client.plugin: plugin manager finished: plugin-type=driver
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.189+0800 [INFO]  client.plugin: shutting down plugin manager: plugin-type=csi
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.189+0800 [INFO]  client.plugin: plugin manager finished: plugin-type=csi
May 10 09:13:02 hostname-redacted nomad[20251]:     2023-05-10T09:13:02.201+0800 [INFO]  agent: shutdown complete
May 10 09:13:02 hostname-redacted systemd[1]: nomad.service: Main process exited, code=exited, status=1/FAILURE
May 10 09:13:02 hostname-redacted systemd[1]: nomad.service: Failed with result 'exit-code'.
May 10 09:13:02 hostname-redacted systemd[1]: nomad.service: Unit process 20440 (nomad) remains running after unit stopped.
May 10 09:13:02 hostname-redacted systemd[1]: nomad.service: Unit process 20451 (nomad) remains running after unit stopped.
May 10 09:13:02 hostname-redacted systemd[1]: Stopped nomad.service - Nomad.
May 10 09:13:02 hostname-redacted systemd[1]: nomad.service: Consumed 4min 6.108s CPU time.

PS: After starting Nomad again it does cleanup the orphaned processes. But it should do that on shutdown instead.

tgross commented 1 year ago

Hi @akamensky! That's intentional! The Nomad client is designed so that you can upgrade it in place without having to restart all the workloads. The executors get reparented to PID1 but when the client starts back up it can reattach to the "task handle" and continue to manage the workload without interruption. The exception to this behavior is when you're running the agent with the -dev flag, which cleans up after itself because that's intended for development use cases.

If you want to stop the workloads when you shutdown a Nomad client, you can use the leave_on_interrupt/leave_on_terminate option along with drain_on_shutdown, or you can drain the workloads manually.

akamensky commented 1 year ago

Thanks for clarification @tgross if the intention is for the process to keep running that would make sense.

The executors get reparented to PID1 but when the client starts back up it can reattach to the "task handle" and continue to manage the workload without interruption

If it actually reattached (even without parenting the process, but for example by talking to it via unix socket or such) that would be fine. That's not what I observe however. Instead what I see is:

executor and logmon get orphaned (so automatically goes to PID1)
The executor, logmon and task keep running
Nomad agent process comes back, and instead of re-attaching to executor it kills all and starts a new process (it gets new PID etc)

akamensky commented 1 year ago

Trying this with the configuration options on agent set to:

leave_on_interrupt = true
leave_on_terminate = true

client {
    drain_on_shutdown {
    deadline           = "60s"
    force              = false
    ignore_system_jobs = false
  }
}

drain_on_shutdown only works if either of leave_on_interrupt, leave_on_terminate is set to true. It does nothing with those are set to false
With leave_on_... options set to true on agent, once the agent comes back online -- no tasks are ever assigned to it again until it is manually marked as "eligible" on servers.

While I understand the reason behind leave_on_... which makes sense if the node is intentionally taken down for the maintenance. I think that:

drain_on_shutdown should work independent of the leave_on_... settings. Defaulting to no drain is also fine.
If the intention is for no draining services on stop is to keep them running -- when agent comes back it should not kill them and start anew, but instead it should handle re-attaching gracefully (i.e. just resume comm over some method, like unix socket or such.

tgross commented 1 year ago

Nomad agent process comes back, and instead of re-attaching to executor it kills all and starts a new process

Was the agent offline long enough that the server rescheduled the workloads? If not, that's likely a bug. The client logs and server logs will have more details which would be helpful if you could share.

drain_on_shutdown should work independent of the leave_on_... settings. Defaulting to no drain is also fine.

The intent is that it's used for turning down the node, and that you'd use a different signal for ordinary in-place upgrades of the binary. There's another open feature request around allowing the node to mark itself eligible again though.

hashicorp / nomad