nixcloud / ip2unix

Turn IP sockets into Unix domain sockets
GNU Lesser General Public License v3.0
361 stars 10 forks source link

Fix preliminary unlinking of socket file #23

Closed aszlig closed 1 year ago

aszlig commented 4 years ago

Currently, this only consists of a regression test but I haven't yet found a very good solution to address this problem.

If we were to mainly tackle the unlink issue (#16), we could just introduce a new flag to disable unlinking altogether.

Unfortunately, this doesn't fully address the actual issue, which is about how we handle sharing versus copying of memory and file descriptors across processes.

For most programs in the wild, this isn't an issue because most of them don't do very complex socket operations, but occasionally - and especially when interacting with other subprocesses - we do run into this issue and then we end up having a very hard time to debug what's going on.

So to describe the issue in more detail, here is a simplified version of how ip2unix currently tracks socket file descriptors:

This all works fine if everything is run in order and without any multiprocessing involved, but as soon as the application invokes clone, things start to get ugly.

Essentially we have two clone flags that are problematic:

See the clone(2) manpage for more detailed information.

Here is how these flags are affecting us (btw. this also includes syscalls such as fork or libc functions such as daemon):

Of course, one way to get there would be to wrap the corresponding syscalls and do some kind of IPC between the main process and various subprocesses, eg. via POSIX shared memory objects. This however is a little bit to complicated and I'd like to avoid wrapping additional libc calls as much as possible.

Another less error prone way in terms of moving parts that could go south would be to store all state that we have inside a shared mapping that is bound to a file descriptor. Again memfd_create comes to mind, but is there a more portable way?

Also, is this even feasible or are there other occasions than CLONE_FILES that have an impact here?

What about (M)FD_CLOEXEC? If either none or all the sockets in the registry are using FD_CLOEXEC and we're essentially setting MFD_CLOEXEC IFF all the sockets have FD_CLOEXEC, everything is fine. But if this is not the case, how do we handle this?

Profpatsch commented 4 years ago

You can detect it, right, so you could at least throw a giant multi-line warning for now if the bad cases happen (and maybe tell users to activate the --do-not-unlink flag or similar).

aszlig commented 4 years ago

You can detect it, right, so you could at least throw a giant multi-line warning for now if the bad cases happen (and maybe tell users to activate the --do-not-unlink flag or similar).

Unfortunately, I can only reliably detect it in the first two cases since in the last two cases, all internal datatypes of ip2unix are copies.