hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.98k stars 1.96k forks source link

Nomad CLI Monitor Dispatched Jobs #11601

Open evandam opened 2 years ago

evandam commented 2 years ago

Proposal

Since the Nomad CLI supports monitoring deployments with nomad job run, it would be great to see similar behavior with nomad job dispatch.

Ideally it would dispatch the job and poll the dispatched job while it is running, and then exit 0/non-zero based on the exit status of the job (maybe some consideration with idempotency tokens needed).

Use-cases

Use case is mostly around a CI environment where we use dispatched jobs to prepare an environment, seed a database, etc.

Being able to shell out to the Nomad CLI rather than custom scripting with the Nomad API would be great.

Attempted Solutions

Current solution is pretty clunky, unless there's a better way to do it.

tgross commented 2 years ago

Thanks for opening this issue @evandam! That makes sense to me, so we'll look into getting it on the roadmap.

lgfa29 commented 1 year ago

3753 also mentions tailing log output, which could be a useful feature as well.

lgfa29 commented 1 year ago

16898 also describes a use case for batch jobs with nomad job run command.

tgross commented 3 months ago

Adding some context for an internal discussion we're having around this issue. Currently we have monitoring on the nomad job dispatch command, which monitors the creation of the evaluation and allocations. This looks something like this:

jobspec ```hcl job "example" { type = "batch" parameterized { payload = "required" } group "group" { task "task" { driver = "docker" config { image = "busybox:1" command = "/bin/sh" args = ["-c", "cat local/payload.txt; sleep 300"] } dispatch_payload { file = "payload.txt" } resources { cpu = 64 memory = 128 } } } } ```
$ echo 'foo' | nomad job dispatch example -
Dispatched Job ID = example/dispatch-1723223743-25a6aca7
Evaluation ID     = 00045600

==> 2024-08-09T13:15:43-04:00: Monitoring evaluation "00045600"
    2024-08-09T13:15:43-04:00: Evaluation triggered by job "example/dispatch-1723223743-25a6aca7"
    2024-08-09T13:15:44-04:00: Allocation "647f8c26" created: node "9308bb31", group "group"
    2024-08-09T13:15:44-04:00: Evaluation status changed: "pending" -> "complete"
==> 2024-08-09T13:15:44-04:00: Evaluation "00045600" finished with status "complete"

So effectively here the command hits the Dispatch Job API and then polls the Read Evaluation API (typically only once) to get the allocations. This is less complicated than monitoring a service deployment because we don't have to hit the Deployment API.

What this issue would hypothetically add is hitting the List Job Allocations API and then presenting the task states in some sensible way. As noted in https://github.com/hashicorp/nomad/issues/11601#issuecomment-1416432567 we could also hit the Stream Logs API but presenting that in a sensible way is a little harder given you may have multiple allocations.