Closed aszlig closed 1 year ago
You can detect it, right, so you could at least throw a giant multi-line warning for now if the bad cases happen (and maybe tell users to activate the --do-not-unlink
flag or similar).
You can detect it, right, so you could at least throw a giant multi-line warning for now if the bad cases happen (and maybe tell users to activate the
--do-not-unlink
flag or similar).
Unfortunately, I can only reliably detect it in the first two cases since in the last two cases, all internal datatypes of ip2unix
are copies.
Currently, this only consists of a regression test but I haven't yet found a very good solution to address this problem.
If we were to mainly tackle the
unlink
issue (#16), we could just introduce a new flag to disable unlinking altogether.Unfortunately, this doesn't fully address the actual issue, which is about how we handle sharing versus copying of memory and file descriptors across processes.
For most programs in the wild, this isn't an issue because most of them don't do very complex socket operations, but occasionally - and especially when interacting with other subprocesses - we do run into this issue and then we end up having a very hard time to debug what's going on.
So to describe the issue in more detail, here is a simplified version of how
ip2unix
currently tracks socket file descriptors:socket
call, we insert the file descriptor into the registry. At this point, we can't make a final decision about how to handle this socket, because we only know the socket family (eg.AF_INET
orAF_INET6
) but no details about ports or IP addresses.bind
orconnect
call, we look up the file descriptor in the registry and see whether we have a matching rule.bind
/connect
function.This all works fine if everything is run in order and without any multiprocessing involved, but as soon as the application invokes
clone
, things start to get ugly.Essentially we have two
clone
flags that are problematic:CLONE_VM
If set, the calling process and the child process run in the same memory space. In particular, memory writes performed by the calling process or by the child process are also visible in the other process. If not set, the child process runs in a separate copy of the memory space of the calling process at the time of the clone call. Memory writes or file mappings/unmappings performed by one of the processes do not affect the other.
CLONE_FILES
If set, the calling process and the child process share the same file descriptor table. Any file descriptor created by the calling process or by the child process is also valid in the other process. If not set, the child process inherits a copy of all file descriptors opened in the calling process at the time of the clone call. Subsequent operations that open or close file descriptors, or change file descriptor flags, performed by either the calling process or the child process do not affect the other process.
See the clone(2) manpage for more detailed information.
Here is how these flags are affecting us (btw. this also includes syscalls such as
fork
or libc functions such asdaemon
)::smiley:
CLONE_VM
|CLONE_FILES
Processes share memory and also share file descriptors, which essentially means that we don't need to do anything, because the registry and the file descriptor table are still in par.
:neutral_face:
CLONE_VM
Little more tricky, we have one shared registry across both processes, but the file descriptor tables are copied. This means that whenever we're closing a socket in the second process, we can not yet
unlink
the socket file until the file descriptor is also closed in the first process. This could be done by adding a reference counter for the socket file descriptors.:unamused:
CLONE_FILES
We have two registries, but one shared file descriptor table. One way to deal with this could be a
mmap
ed file descriptor that could be used to communicate between the two registries.However: How would one indicate presence of the other?
:scream:
0
Again, two registries, but also a copy of the file descriptor table.
At first glance this might be an obvious case of "we don't need to handle it", but consider the scenario mentioned in an earlier comment:
Process 1 creates a socket,
clone
s into process 2, process 2 closes the socket... now what? Process 1 still has to be able to do operations on the socket, but it's essentially blackholed. We could usememfd_create
in conjunction withmmap
, we could even inspectprocfs
, but all of that will make it a nightmare to port to other systems.Of course, one way to get there would be to wrap the corresponding syscalls and do some kind of IPC between the main process and various subprocesses, eg. via POSIX shared memory objects. This however is a little bit to complicated and I'd like to avoid wrapping additional libc calls as much as possible.
Another less error prone way in terms of moving parts that could go south would be to store all state that we have inside a shared mapping that is bound to a file descriptor. Again
memfd_create
comes to mind, but is there a more portable way?Also, is this even feasible or are there other occasions than
CLONE_FILES
that have an impact here?What about (
M
)FD_CLOEXEC
? If either none or all the sockets in the registry are usingFD_CLOEXEC
and we're essentially settingMFD_CLOEXEC
IFF all the sockets haveFD_CLOEXEC
, everything is fine. But if this is not the case, how do we handle this?