hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.76k stars 1.94k forks source link

hot reconfiguration and live migration interface for task drivers #19752

Open Jamesits opened 7 months ago

Jamesits commented 7 months ago

Proposal

Some types of task (both its in-memory state and its on-disk state) can be reconfigured or migrated to another node without it being shut down first. We should have a way to handle this gracefully.

We might need:

Use-cases

I'm investigating that if I can manage and monitor a bunch of VMs in Nomad with a custom task driver. These VMs might be stateless but I don't want them to be shutdown during a reschedule.

Other existing needs:

and we might need: https://github.com/hashicorp/nomad/issues/15489

Attempted Solutions

Currently Nomad only allows shutting down the task from the original node then starting it on the new node. The driver does not know on a high level that this task is being rescheduled rather than changed.

Writing a remote task driver on top of a current VM management solution (e.g. Proxmox VE) might be one possible way, but it is limited on my specific usage and does not scale well.

tgross commented 7 months ago

Hi @Jamesits! Yeah, as you've noted a lot of this kind of thing would be specific to the task drivers so #2323 and #13785 are blocking for this. I'll keep this issue open to help tie everything together. We've been discussing this kind of thing internally a bit as something that would help Nomad replace VMWare deployments.

The major architectural hurdle here is that Nomad doesn't place tasks -- it places allocations which may have multiple tasks. And those tasks don't all need to have the same task driver! So right out of the gate we'd need to figure out how to migrate multiple tasks simultaneously. Ex. what happens if the tasks share state or even just ongoing network communication? And figure out what limitations we'd need to place on multi-task-driver allocs for this feature.