Open groggemans opened 6 years ago
I too am eagerly awaiting for the enhancements around Nomad detecting a "driver failure" which seem to be part of version 8.0.
My need is that the docker daemon fails and is unable to start the assigned task. Nomad keeps scheduling the task though onto the same node.
There seems to be some docker bug as well for my specific problem, being fixed in the upcoming 18.03 version of Docker, but overall, driver failure detection would be an awesome functionality to have in Nomad itself.
This will be a feature in Nomad 0.8- if a Nomad client detects Docker as unresponsive, tasks requiring Docker will be scheduled onto another node where Docker is healthy.
In some situations the current heartbeat check is not sufficient to detect problems with nodes running the nomad agent. It would be nice if we could extend the heartbeat check with custom checks.
I had a few cases where an application on a node misbehaved and my node checks in consul went into a failed state. Nomad's heartbeat check didn't detect the problem and just kept scheduling tasks to the node. For now the only way to work around this is by adding a consul watch/handler which starts draining the troublesome node.
@schmichael indicated that there will be some improvements regarding draining and node health detection in v0.8, but no custom health checks.