hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.6k stars 1.92k forks source link

Infinite template re-render loop when using {{ timestamp }} in two different templates of the same task #20618

Open faryon93 opened 2 months ago

faryon93 commented 2 months ago

Nomad version

Output from `nomad version`
Nomad v1.7.7
BuildDate 2024-04-16T19:26:43Z
Revision 0f34c85ee63f6472bd2db1e2487611f4b176c70c

Operating system and Environment details

$: cat /etc/debian_version 
12.5

$: uname -a
Linux nomad2 6.1.0-20-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.85-1 (2024-04-11) x86_64 GNU/Linux

$: podman version
Client:       Podman Engine
Version:      4.3.1
API Version:  4.3.1
Go Version:   go1.19.8
Built:        Thu Jan  1 01:00:00 1970
OS/Arch:      linux/amd64

Issue

When I use the {{ timestamp "unix"}} function in two differente template blocks of the same task and one of the templates uses other functions which can cause re-rendering (like {{ key }} or {{ service }}, etc. ) and a re-rendering is performed, the template gets re-rendered in an infinite loop.

The problem does not exist when the task has only one template.

Sidenote: When the task is stuck in the renderloop proper shutdown is not possible.

Reproduction steps

  1. create key foo/bar in consul with some random content
  2. run the jobfile from below
  3. wait until task is running
  4. change the key foo/bar to another value
  5. observe the Recent Events table of the task for re-occurring Template re-rendered messages

Expected Result

The template should be re-rendered exactly once.

Actual Result

The template gets re-rendered in an infinite loop every few seconds.

Screenshot_20240516_192701

Job file (if appropriate)

job "fileserver" {
  group "http" {
    count = 1

    network {
      mode = "bridge"
    }

    task "nginx" {
      driver = "podman"
      config {
        image = "docker.io/library/nginx:latest"
      }

      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
# K/V: {{ key "foo/bar"}}
        EOF
        destination = "/local/test.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }

      template {
        data = <<EOF
# Timestamp: {{ timestamp "unix" }}
        EOF
        destination = "/local/test1.txt"
        change_mode   = "signal"
        change_signal = "SIGHUP"
      }
    }
  }
}

Nomad Server / Client logs

2024-05-16T19:24:46.088+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:24:51.192+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type="Task Setup" msg="Building Task Directory" failed=false
2024-05-16T19:24:51.298+0200 [INFO]  agent: (runner) creating new runner (dry: false, once: false)
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) creating watcher
2024-05-16T19:24:51.299+0200 [INFO]  agent: (runner) starting
2024-05-16T19:24:51.347+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:24:51.419+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:24:51.754+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Completely started container: driver=podman @module=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 ip=172.26.64.116 taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a timestamp="2024-05-16T19:24:51.753+0200"
2024-05-16T19:24:51.757+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Started msg="Task started by client" failed=false
2024-05-16T19:25:25.847+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:25.895+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:26.863+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:29.775+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:30.951+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:35.858+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:35.941+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:38.577+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:40.997+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:43.301+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:45.995+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:49.458+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:51.051+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:25:52.674+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:25:56.052+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:25:56.056+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Signaling msg="Template re-rendered" failed=false
2024-05-16T19:26:01.109+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test1.txt"
2024-05-16T19:26:04.697+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing msg="Sent interrupt. Waiting 5s before force killing" failed=false
2024-05-16T19:26:04.699+0200 [INFO]  client.driver_mgr.nomad-driver-podman: Stopping task: driver=podman signal="" taskID=3a19ad33-e832-7205-167e-936dc7307a94/nginx/55e0c66a @module=podman timestamp="2024-05-16T19:26:04.699+0200"
2024-05-16T19:26:04.715+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=04eea9f3-7518-cec6-4808-1dbcb9020c37 task=nginx type=Received msg="Task received by client" failed=false
2024-05-16T19:26:05.811+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Terminated msg="Exit Code: 0" failed=false
2024-05-16T19:26:05.836+0200 [WARN]  client.driver_mgr.nomad-driver-podman: Could not remove container: driver=podman container=4e213b233753ad26a521001cc646f92fb0e1ca796ee4a59d453fb568e33b37b9 error="cannot delete container, status code: 200" @module=podman timestamp="2024-05-16T19:26:05.835+0200"
2024-05-16T19:26:05.836+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killed msg="Task successfully killed" failed=false
2024-05-16T19:26:06.102+0200 [INFO]  client.alloc_runner.task_runner: Task event: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx type=Killing
    msg=
    | Template failed to send signals [hangup]: 1 error occurred:
    | \t* Task not running
    |
    failed=true
    2024-05-16T19:26:06.108+0200 [INFO]  agent: (runner) rendered "(dynamic)" => "/opt/nomad/data/alloc/3a19ad33-e832-7205-167e-936dc7307a94/nginx/local/test.txt"
2024-05-16T19:26:09.841+0200 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx @module=logmon timestamp="2024-05-16T19:26:09.841+0200"
2024-05-16T19:26:09.848+0200 [INFO]  client.alloc_runner.task_runner.task_hook.logmon: plugin process exited: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94 task=nginx plugin=/usr/bin/nomad id=1227172
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) stopping
2024-05-16T19:26:09.849+0200 [INFO]  agent: (runner) received finish
2024-05-16T19:26:09.849+0200 [INFO]  client.gc: marking allocation for GC: alloc_id=3a19ad33-e832-7205-167e-936dc7307a94
pkazmierczak commented 1 month ago

Hi @faryon93, thanks for reporting this. I was able to reproduce, and not only with podman but also with docker driver. Indeed there's something fishy happening with the taskrunner here, and we'll look into it.