hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
15k stars 1.96k forks source link

nomad coping with OS resources limits #6557

Open notnoop opened 5 years ago

notnoop commented 5 years ago

When aiming to run many allocations on a single client, it is easy to overlook adjusting the OS resource limits and have nomad hits these limits. As of 0.10.0, Nomad may start user allocations but partially crash before persisting their state, resulting into many leaked processes that Nomad no longer manages. When a client gets to that state, the best option is to destroy client.

Some sample flags for linux:

Hitting these limits is damaging to many other critical services, even if Nomad remained healthy. Docker has some known issues and some guidance for tweaking these values[1][2].

We can try to address this by many means; here is a sample of possible actions:

[1] https://success.docker.com/article/how-to-reserve-resource-temporarily-unavailable-errors-due-to-tasksmax-setting [2] https://github.com/docker/for-linux/issues/73

tgross commented 5 years ago

Another example I just documented in https://github.com/hashicorp/nomad/pull/6607 is kernel tunables for bridge networking, which took up a lot of time to figure out in https://github.com/hashicorp/nomad/issues/6580#issuecomment-548769102

In that scenario we want a particular set of tunables for network namespace / Connect support, but probably don't want to enforce them because it'll break the QEMU task driver:

net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-arptables = 1

(These aren't strictly "resource limitations" but can be set at runtime on clients and change outside of Nomad's control, so I suspect the solutions are going to be similar.)