rootless-containers / rootlesskit

Linux-native "fake root" for implementing rootless containers
Apache License 2.0
991 stars 98 forks source link

Sockets from systemd activation not inherited by child #428

Closed charliemirabile closed 8 months ago

charliemirabile commented 8 months ago

In order to make it easier to run daemons that normally run as root as a non root user with rootlesskit, it would be very helpful for rootless kit to be aware of and support the systemd socket activation (https://www.freedesktop.org/software/systemd/man/latest/systemd.socket.html).

The gist is that socket(s) are created by systemd instead of the daemon itself and when a process attempts to connect to the socket(s), the actual daemon is started and it inherits the socket(s) and special environment variables are set to describe who the socket(s) are intended for, how many file descriptors there are, and their names. The daemon (if it is aware of this feature and properly configured to work with it) can then skip setting sockets as it normally might during startup and instead use the provided one(s).

The benefits are myriad: using socket activation can cut down on startup time because daemons are only started when the first client attempts to connect to them, and further because the daemon inherits already configured sockets, all of its code can run unprivileged because it can rely on systemd can use its permissions to bind to low port numbers.

In terms of what is required for this feature to work in rootlesskit, the parent process would need to check for the presence of the systemd environment variables ($LISTEN_FDS, $LISTEN_PID, $LISTEN_FDNAMES) and if it detects them, make sure that the file descriptors 3, 4, ... (3 + $LISTEN_FDS -1) are inherited by the child, and then the child would again need to make sure that those file descriptors are inherited by the real program, and also fixup the value of LISTEN_PID to reflect the pid of the child process before actually execve'ing it.

charliemirabile commented 8 months ago

It seems like what I suggested about fixing up the pid in LISTEN_PID is actually quite annoying to do in go because fork and exec are tightly coupled. I think this is ok though because if the daemon in question is picky about the value of LISTEN_PID (like the docker-daemon) a script can be inserted that does the following to fixup the value for the daemon

#!/bin/sh
export LISTEN_PID=$$
exec <real daemon> $@