hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.86k stars 1.95k forks source link

Nomad plan why-not #17511

Open valodzka opened 1 year ago

valodzka commented 1 year ago

Proposal

Implement a command (separate or as part of the job plan) that explains why a particular group cannot be placed on a particular node (something distantly similar to aptitude why). Possible syntax:

nomad job plan -why-not-group group1 -why-not-node node1  job.nomad

Output:

Allocation group1 cannot be placed on node node1 because it exhaust dimension "memory"

Possible additional option is something like -why-not-count 3 to check why 3 allocations of a group cannot be placed on a particular node.

Use-cases

Periodically I stumble across situations when a nomad plan shows that some allocation cannot be placed and it's not immediately clear why. Nomad message is fairly cryptic in even in verbose mode (and especially with a lot of nodes):

Task Group "nginx" (failed to place 1 allocation):
  * Class "c1": 1 nodes excluded by filter
  * Constraint "computed class ineligible": 1 nodes excluded by filter
  * Resources exhausted on 2 nodes
  * Class "c2" exhausted on 2 nodes
  * Dimension "network: reserved port collision http=8272" exhausted on 1 nodes
  * Dimension "memory" exhausted on 1 nodes

It will be time saving to have the ability to check why task group cannot be placed to specific node.

Attempted Solutions

Currently it requires a lot of configs checks to get understanding why a job cannot be scheduled on a particular node. It is doable but can be quite time consuming.

tgross commented 1 year ago

Hi @valodzka! This is a really cool idea (and I like the whimsy of the suggested name :grinning: ).

There's a couple of interesting challenges here. The scheduler evaluates the entire job so that it can determine whether or not to remove or update-in-place allocations that are already running. So not all dimensions that are exhausted make sense if you try to check a single node. Ex. spread blocks or distinct_hosts constraints can only be determined with respect to all the other allocations. And we'd probably have to extract the logic for feasibility checking a single node from the rest of the scheduler.

So maybe given those two problems, the right approach here would be to run a full plan (just like we normally do, similar to nomad job plan), but then extract more detailed information about why nodes were rejected so we can filter down to one and report back to the user. Or we could add that data to the normal nomad job plan output and just make it a lot more verbose.

I'll mark this issue for further discussion and roadmapping.