Open tantra35 opened 2 years ago
Thanks for raising this @tantra35, from a quick look at the information you provided (thanks for all the details!) I suspect we're missing some clean-up in an error code path.
@lgfa29 could you please tell is it possible expect a fix soon?
We don't have a date for a fix. I placed this into our backlog for further triaging.
Doing some issue cleanup and wanted to confirm that this is still the case even after some improvements we've made recently to the exec
driver's process cleanup. Using the following jobspec:
We get task events like the following (as expected):
Recent Events:
Time Type Description
2024-06-24T14:40:00-04:00 Not Restarting Error was unrecoverable
2024-06-24T14:40:00-04:00 Driver Failure driver does not allow the following capabilities: net_raw
2024-06-24T14:40:00-04:00 Task Setup Building Task Directory
2024-06-24T14:40:00-04:00 Received Task received by client
But after a couple of restarts we get leaked executor
processes as reported above:
$ ps afx
...
1997 ? Ssl 0:01 /usr/local/bin/nomad agent -config /etc/nomad.d
2131 ? Ssl 0:00 \_ /usr/local/bin/nomad executor {"LogFile":"/var/nomad/data/alloc/91bdfcf2-9972-5985-8cd7-62a5d566e193/sleep/executor.out
2166 ? Ssl 0:00 \_ /usr/local/bin/nomad executor {"LogFile":"/var/nomad/data/alloc/7599c82e-831f-7699-33f4-c6ab8da2655f/sleep/executor.out
I'm going to re-title this slightly and mark it for roadmapping. I'll also note from a quick look at the code that it almost certainly impacts the java
driver and possibly the raw_exec
driver as well, but haven't tested that.
Nomad version
Output from
Nomad v1.1.10 (2f08fe230da05e1b179710ebe0e2582249599a4b+CHANGES)
Operating system and Environment details
Ubuntu 20.04
Issue
If we use unhallowed caps for exec driver after faill we got leeaking nomad exec processes
Reproduction steps
For example if we use
net_raw
caps that doens't allowed by default for exec driverafter allocation on node fail with follow task state(which is absolutely expected behavior)
on client node we got leaked
nomad executor
processes (here we demonstrate some output ofps axuf
)