hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
15k stars 1.96k forks source link

Task Interpolation behavior change in 0.9.0 #5809

Open notnoop opened 5 years ago

notnoop commented 5 years ago

Nomad version

Nomad v0.9.0 (18dd59056ee1d7b2df51256fe900a98460d3d6b9)

Also affects v0.9.1 and v0.9.2

Operating system and Environment details

Linux linux 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Affects macOS installation too.

Issue

Nomad 0.9.0 started interpolating variables appearing anywhere in task driver config, and failing if nomad is unable to interpolate. Nomad 0.8 jobs would interpolate if we can, but leave interpolation text unprocessed if it couldn't. This means that Nomad 0.8 jobs that used shell interpolation with "${ENV_VAR}" syntax stop working.

Operators can workaround behavior by referencing env-var differently, e.g. without braces (i.e. $ENV_VAR) or by double quoting env-var reference (e.g. $${ENV_VAR}).

Reproduction steps

On 0.8.7, run the following:

nomad job run <path/to/job_file>
nomad job status <created_alloc_id>
nomad job logs <created_alloc_id>

Then try running it against a 0.9.X release

Job file (if appropriate)

job "example-envvar" {
  datacenters = ["dc1"]
  type = "batch"

  group "cache" {
    task "redis" {
      driver = "docker"

      config {
        image = "redis:3.2"
        command = "/bin/sh"
        args = ["-c", <<EOF
MYENV="from envvar"
echo "hello ${MYENV}"
EOF
]
      }

      resources {
        cpu    = 500
        memory = 256
      }
    }
  }
}

Observed

Notice below how task completes successfully on 0.8.7, but fails to even start on 0.9.2:

On Nomad 0.8.7

vagrant@linux:/tmp$ nomad job run /tmp/example-envvar.nomad
==> Monitoring evaluation "8628a5a5"
    Evaluation triggered by job "example-envvar"
    Allocation "07315953" created: node "e75e799a", group "cache"
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "8628a5a5" finished with status "complete"
vagrant@linux:/tmp$ nomad alloc status 07315953
ID                  = 07315953
Eval ID             = 8628a5a5
Name                = example-envvar.cache[0]
Node ID             = e75e799a
Node Name           = <none>
Job ID              = example-envvar
Job Version         = 0
Client Status       = complete
Client Description  = <none>
Desired Status      = run
Desired Description = <none>
Created             = 9s ago
Modified            = 3s ago

Task "redis" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
500 MHz  256 MiB  300 MiB

Task Events:
Started At     = 2019-06-11T12:27:07Z
Finished At    = 2019-06-11T12:27:11Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type        Description
2019-06-11T12:27:11Z  Terminated  Exit Code: 0
2019-06-11T12:27:07Z  Started     Task started by client
2019-06-11T12:27:06Z  Task Setup  Building Task Directory
2019-06-11T12:27:06Z  Received    Task received by client
vagrant@linux:/tmp$ nomad alloc logs 07315953
hello from envvar

On 0.9.2:

vagrant@linux:/tmp$ nomad job run /tmp/example-envvar.nomad
==> Monitoring evaluation "b91b7c61"
    Evaluation triggered by job "example-envvar"
    Allocation "f688d217" created: node "96861c74", group "cache"
    Allocation "f688d217" status changed: "pending" -> "failed" (Failed tasks)
    Evaluation status changed: "pending" -> "complete"
==> Evaluation "b91b7c61" finished with status "complete"
vagrant@linux:/tmp$ nomad alloc status f688d217
ID                   = f688d217
Eval ID              = b91b7c61
Name                 = example-envvar.cache[0]
Node ID              = 96861c74
Node Name            = linux
Job ID               = example-envvar
Job Version          = 0
Client Status        = failed
Client Description   = Failed tasks
Desired Status       = run
Desired Description  = <none>
Created              = 6s ago
Modified             = 2s ago
Replacement Alloc ID = 7b5bb5f8

Task "redis" is "dead"
Task Resources
CPU      Memory   Disk     Addresses
500 MHz  256 MiB  300 MiB

Task Events:
Started At     = N/A
Finished At    = 2019-06-11T12:27:57Z
Total Restarts = 0
Last Restart   = N/A

Recent Events:
Time                  Type            Description
2019-06-11T12:27:57Z  Killing         Sent interrupt
2019-06-11T12:27:57Z  Not Restarting  Error was unrecoverable
2019-06-11T12:27:57Z  Driver Failure  failed to decode driver config: [pos 37]: readContainerLen: Unrecognized descriptor byte: hex: d4, decimal: 212
2019-06-11T12:27:57Z  Task Setup      Building Task Directory
2019-06-11T12:27:57Z  Received        Task received by client
vagrant@linux:/tmp$ nomad alloc logs f688d217
Error reading file: Unexpected response code: 404 (task "redis" not started yet. No logs available)
stale[bot] commented 5 years ago

Hey there

Since this issue hasn't had any activity in a while - we're going to automatically close it in 30 days. If you're still seeing this issue with the latest version of Nomad, please respond here and we'll keep this open and take another look at this.

Thanks!

stale[bot] commented 5 years ago

This issue will be auto-closed because there hasn't been any activity for a few months. Feel free to open a new one if you still experience this problem :+1: