Closed schmichael closed 2 days ago
It may be possible for the new intermediary process to avoid the third fork and exiting in favor of calling Exec directly to replace itself with the task's command.
This seems preferable to triple-forking, and functionally the best approach. That being said I think it's considerable more effort for an EOL subsystem (cgroup v1) than merely having the executor leave the custom cgroups after forking the user process.
Use-cases
Nomad v1.8.0 (#20481) added the
cgroup_v1_override
and corresponding v2 parameters to allow placing task processes in specific cgroups.The goal of this feature is to enable users' with precise cgroup requirements for their tasks absolute control.
cgroup v2 uses the
clone3(2)
CLONE_INTO_CGROUP
to spawn only the task process in the custom cgroups.cgroup v1 is not supported and uses the traditional double-fork approach:
executor
executor
sets up (potentially custom) cgroups and forks (2) the task command.This has the unfortunate side effect of leaving the
executor
process in the custom groups with the task process and prevents users from having full control over their custom cgroups.Proposal
The cgroup v1 behavior should match the cgroup v2 behavior: the
executor
should not be part of the custom cgroup.A straightforward, but imperfect, approach would be for the
executor
to detach from the custom cgroups after forking the child process. Writing theexecutor
's pid to the root cgroups after forking the task process would remove it from the tasks' cgroups. However there would be a window of time in which both the task andexecutor
were running in the custom cgroup.An alternative that avoids the race condition may be possible but would significantly complicate Nomad's executor: we could triple fork where the new intermediary process handles setting up and entering cgroups but exits after forking the user process.
The executor treats subprocesses exiting as the task exiting, so significant code changes would be required to support this new flow just for cgroup v1 override support.
It may be possible for the new intermediary process to avoid the third fork and exiting in favor of calling
Exec
directly to replace itself with the task's command.