hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.64k stars 1.93k forks source link

nomad job restart command should stop leader task first #19403

Open louievandyke opened 7 months ago

louievandyke commented 7 months ago

Nomad version

Output from nomad version

vagrant@linux:/opt/gopath/src/github.com/hashicorp/nomad$ nomad version
Nomad v1.7.0-dev+ent
BuildDate 2023-11-02T04:09:19Z
Revision 02cf9f45545e3f57022d6a8661ae32eb991863dc

Operating system and Environment details

vagrant@linux:/opt/gopath/src/github.com/hashicorp/nomad$ uname -a
Linux linux 4.15.0-163-generic #171-Ubuntu SMP Fri Nov 5 11:55:11 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

Issue

When initiating a job restart command the tasks are restarted without applying lifecycle rules (pre-start tasks first...etc) nor leader flag.

Reproduction steps

have one task groups with prestart tasks, leader tasks and shutdown_delay tasks.

restart the job eg.

nomad job restart sleep-while-lifecycle

Expected Result

The lifecycle of the task group should be honored

Actual Result

Leader and sidecar tasks are sent a signal at the same time

Job file (if appropriate)

job sleep-while-lifecycle {
  datacenters = ["dc1"]
  group "group" {
    count = 1

## You might want to constrain this, so here's one to help
#    constraint {
#      attribute = "${attr.unique.hostname}"
#      operator  = "="
#      value     = "nomad-client-1.node.consul"
#    }

task "leader" {
      template {
        data = <<EOH
#!/bin/bash

SLEEP_SECS=$${SLEEP_SECS:-2} # provide default of 2 seconds
echo "$(date) - Starting. SLEEP_SECS=${SLEEP_SECS}"
while true; do echo "$(date) - Sleeping for ${SLEEP_SECS} seconds."; sleep ${SLEEP_SECS}; done

EOH
        destination = "local/sleepy.sh"
      }

      driver = "exec"
      leader = true
   shutdown_delay = "20s"

      config {
        command = "${NOMAD_TASK_DIR}/sleepy.sh"
      }

      resources {
        memory = 100
        cpu = 100
      }
    }
task "regular" {
      template {
        data = <<EOH
#!/bin/bash

SLEEP_SECS=$${SLEEP_SECS:-2} # provide default of 2 seconds
echo "$(date) - Starting. SLEEP_SECS=${SLEEP_SECS}"
while true; do echo "$(date) - Sleeping for ${SLEEP_SECS} seconds."; sleep ${SLEEP_SECS}; done

EOH
        destination = "local/sleepy.sh"
      }

      driver = "exec"

      config {
        command = "${NOMAD_TASK_DIR}/sleepy.sh"
      }

      resources {
        memory = 100
        cpu = 100
      }
    }
task "sidecar" {
      template {
        data = <<EOH
#!/bin/bash

SLEEP_SECS=$${SLEEP_SECS:-2} # provide default of 2 seconds
echo "$(date) - Starting. SLEEP_SECS=${SLEEP_SECS}"
while true; do echo "$(date) - Sleeping for ${SLEEP_SECS} seconds."; sleep ${SLEEP_SECS}; done

EOH
        destination = "local/sleepy.sh"
      }

      driver = "exec"
      lifecycle {
        hook = "prestart"
        sidecar = true
      }

      config {
        command = "${NOMAD_TASK_DIR}/sleepy.sh"
      }

      resources {
        memory = 100
        cpu = 100
      }
    }
task "non-sidecar" {
      template {
        data = <<EOH
#!/bin/bash

SLEEP_SECS=$${SLEEP_SECS:-2} # provide default of 2 seconds
runtime="1 minute"
endtime=$(date -ud "$runtime" +%s)
echo "$(date) - Starting. SLEEP_SECS=${SLEEP_SECS}"
while [[ $(date -u +%s) -le $endtime ]]; do echo "Time Now: `date +%H:%M:%S`"; echo "Sleeping for 2 seconds"; sleep ${SLEEP_SECS}; done

EOH
        destination = "local/sleepy.sh"
      }

      driver = "exec"
      lifecycle {
        hook = "prestart"
        sidecar = false
      }

      config {
        command = "${NOMAD_TASK_DIR}/sleepy.sh"
      }

      resources {
        memory = 100
        cpu = 100
      }
    }
  }
}

Nomad Server logs (if appropriate)

I’ve added the log outputs from the tasks and you can see they receive the signal all at the same time when I initiate a job restart Fri Dec 1 18:24:54 UTC 2023 - Starting. SLEEP_SECS=2

vagrant@linux:/opt/gopath/src/github.com/hashicorp/nomad$ nomad job restart sleep-while-lifecycle
==> 2023-12-01T18:24:49Z: Restarting 1 allocation
    2023-12-01T18:24:49Z: Restarting running tasks in allocation "868d67ca" for group "group"
==> 2023-12-01T18:24:54Z: Job restart finished

Job restarted successfully!
leader task stdout

Fri Dec  1 18:23:56 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:23:56 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:23:58 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:00 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:02 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:04 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:06 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:08 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:10 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:12 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:14 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:16 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:18 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:20 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:22 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:24 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:26 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:28 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:30 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:32 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:34 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:36 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:38 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:40 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:42 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:44 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:46 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:48 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:50 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:52 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:54 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:24:54 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:56 UTC 2023 - Sleeping for 2 seconds.

sidecar task stdout

Fri Dec  1 18:23:56 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:23:56 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:23:58 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:00 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:02 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:04 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:06 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:08 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:10 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:12 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:14 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:16 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:18 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:20 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:22 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:24 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:26 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:28 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:30 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:32 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:34 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:36 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:38 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:40 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:42 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:44 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:46 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:48 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:50 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:52 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:54 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:24:54 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:56 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:58 UTC 2023 - Sleeping for 2 seconds.

regular task stdout

Fri Dec  1 18:23:56 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:23:56 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:23:58 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:00 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:02 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:04 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:06 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:08 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:10 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:12 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:14 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:16 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:18 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:20 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:22 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:24 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:26 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:28 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:30 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:32 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:34 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:36 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:38 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:40 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:42 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:44 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:46 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:48 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:50 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:52 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:54 UTC 2023 - Starting. SLEEP_SECS=2
Fri Dec  1 18:24:54 UTC 2023 - Sleeping for 2 seconds.
Fri Dec  1 18:24:56 UTC 2023 - Sleeping for 2 seconds.

Nomad Client logs (if appropriate)

tgross commented 6 months ago

The repro here isn't a minimal repro and has a lot of moving parts, so let's boil it down to the essentials:

Something important to note here is that most of these fields control the order we start tasks. Only "leader" has any controls on when tasks are shut down. The leader=true field docs say:

Specifies whether the task is the leader task of the task group. If set to true, when the leader task completes, all other tasks within the task group will be gracefully shutdown. The shutdown process starts by applying the shutdown_delay if configured. It then stops the the leader task first, followed by non-sidecar and non-poststop tasks, and finally sidecar tasks. Once this process completes, post-stop tasks are triggered. See the lifecycle documentation for a complete description of task lifecycle management.

This is all strictly true and works as described. If the leader task were to be shut down, we'd see the other tasks shut down in that order. But nomad job restart restarts all the tasks unless the -task option is used:

-task=: Specify the task to restart. Can be specified multiple times. If groups are also specified the task must exist in at least one of them. If no task is set only tasks that are currently running are restarted. For example, non-sidecar tasks that already ran are not restarted unless -all-tasks is used instead. This option cannot be used with -all-tasks or -reschedule.

The leader flag never ends up being consulted because those tasks are already stopped. (I would not be shocked if there was a race condition here though where it's possible for one of the sidecar tasks to start back up quickly enough to get shut down when the leader completes shutdown.)

So for cases where we're shutting down all the tasks, what we probably want to do is see if there's a leader=true flag set on any of the running tasks and stop only that task via the RPC, so that the other tasks can stop in the expected order. I'll mark this for roadmapping.

louievandyke commented 5 months ago

I apologize for sharing the spec where I hadn't specified the leader (I had been tweaking it during testing), but this is great info to be aware of as I hadn't targeted -task.

But nomad job restart restarts all the tasks unless the -task option is used: So for cases where we're shutting down all the tasks, what we probably want to do is see if there's a leader=true flag set on any of the running tasks and stop only that task via the RPC, so that the other tasks can stop in the expected order. I'll mark this for roadmapping.