hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.81k stars 1.94k forks source link

Nomad should allow draining all clients which matches metadata values #1037

Open diptanu opened 8 years ago

diptanu commented 8 years ago

Operators could drain clusters easily if Nomad allowed specifying certain values which matches metadata while draining. For example, operators could drain all nodes in an AWS ASG, or all nodes which matches the value of Nomad's client's version.

The implementation can be broken down into 2 phases which can be implemented independently:

  1. Batch Drain API - An alternative API to /v1/node/:nodeid/drain that accepts multiple Node IDs and updates those nodes atomically in Raft.
  2. Selection CLI/UI ... possibly API? - The Batch Drain API may or may not include a selection API. This may be implemented purely in the CLI/UI or even extrernal tooling. Eventually a general selection solution would be ideal as it would be broadly useful across Nomad (eg a selection DSL for list APIs). For very large clusters there's a slight optimization to sending a query vs independent IDs. However even at 10k nodes, draining every node would take ~400 KB. A large request, but not a concern.

While many users have made their own solution to #2, without #1 being solved it risks races between marking nodes as draining and rescheduling work causing lots of churn.

Imagine the hypothetical command:

nomad node-drain -enable -yes ab12 cd34 ef56

Without a batching API that could produce the following timeline:

  1. /v1/node/ab12/drain
  2. ab12 set to drain
  3. allocs on ab12 rescheduled onto cd34 ef56
  4. /v1/node/cd45/drain
  5. allocs from ab12 must be rescheduled again
  6. etc

As you can see we may end up rescheduling allocations multiple times. While the cluster always stabilizes there will be an unnecessary amount of work with lots of allocations created and replaced in a short amount of time.

*With a batching API all of the nodes would be atomically updated and allocations would be rescheduled exactly once.

BSick7 commented 8 years ago

Perhaps it could also be valuable to do a node-drain on a list of node ids.

nomad node-drain -enable -yes abcdefg1 abcdefg2 abcdefg3

If you run each one individually, workloads from abcdefg1 may be pushed to abcdefg2 which makes abcdefg2 take longer to clear out its workload.

Miserlou commented 6 years ago

I just want to be able to drain everything.

nomad node-drain -enable -yes -all or nomad node-drain -enable -yes *