moby / vpnkit

A toolkit for embedding VPN capabilities in your application
Apache License 2.0
1.1k stars 187 forks source link

Set O_CLOEXEC on received file descriptors #537

Closed djs55 closed 3 years ago

djs55 commented 3 years ago

By default Go sets O_CLOEXEC on file descriptors it creates. This ensures that they aren't accidentally inherited over fork+exec.

When binding a port on the host we first try net.Listen. On modern macOS this works for

Unfortunately it doesn't work for privileged ports on specific IPs, such as 127.0.0.1:80. For these cases we contact a helper process which runs as root, which binds the port and then uses sendmsg to transmit it to us. Unfortunately when we recvmsg a file descriptor, the resulting descriptor does not have O_CLOEXEC set, so it is captured by future subprocesses. Even if the parent process calls close(), the port is still in use by the subprocesses and can't be re-bound. This caused the failure seen in https://github.com/docker/for-mac/issues/5649

The solution is to ensure all file descriptors have O_CLOEXEC set. On Linux there's a sendmsg flag to set the flag atomically on received fds, but this is missing on Darwin. The best we can do is to call fcntl to set the flag ourselves after receiving the fd. Unfortunately this means there's a small race window where a fork would inherit the fd, but this shouldn't happen very often in practice.

Go will call fcntl for us if we "adopt" the file descriptor using os.NewFile and then use net.FileListener and net.FilePacketConn. This allows us to remove a lot of complicated wrapping code and allows us to remove a workaround for Close hanging when blocked in Accept, caused by the state of the file descriptor not being correct. This required some minor changes to types (net.Listener rather than *net.TCPListener) but it's worth it.

It's possible to work around this problem in other ways in the client code, for example:

  1. always speculatively closing file descriptors after fork but before exec. This can be expensive with high ulimit values, if there is no API to enumerate open fds. I'm not sure it can be done easily in Go where the exec.Command API is high-level.
  2. ban running long-lived subprocesses in your program. Restructure the code so that the long-lived subprocesses are managed by a separate process which doesn't bind ports.

Signed-off-by: David Scott dave@recoil.org