SECCOMP_IOCTL_NOTIF_ADDFD
(Kernel 5.9)bypass4netns is as fast as --net=host
and almost as secure as traditional slirp4netns.
The current version of bypass4netns needs to be used in conjunction with slirp4netns, however, future version may work without slirp4netns.
Workload: iperf3 -c HOST_IP
from podman run
--net=host
(insecure): 57.9 Gbpsbypass4netns eliminates the overhead of slirp4netns by trapping socket syscals and executing them in the host network namespace using
SECCOMP_IOCTL_NOTIF_ADDFD
.
See also the talks.
Build-time requirement:
make
sudo make install
The following binaries will be installed into /usr/local/bin
:
bypass4netns
: the bypass4netns binary.bypass4netnsd
: an optional REST daemon for controlling bypass4netns processes from a non-initial network namespaces. Used by nerdctl.$ bypass4netns --ignore="127.0.0.0/8,10.0.0.0/8,auto" -p="8080:80"
--ignore=...
is a list of the CIDRs that cannot be bypassed:
127.0.0.0/8
)10.0.0.0/8
)auto
)$ ./test/seccomp.json.sh >$HOME/seccomp.json
$ $DOCKER run -it --rm --security-opt seccomp=$HOME/seccomp.json --runtime=runc alpine
$DOCKER
is either docker
, podman
, or nerdctl
.
bypass4netns is experimentally integrated into nerdctl (>= 0.17.0).
containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run -it --rm -p 8080:80 --annotation nerdctl/bypass4netns=true alpine
NOTE: nerdctl prior to v2.0 needs --label
instead of --annotation
.
Also, the syntax will be probably replaced with --security-opt
or something like --network-opt
in a future version of nerdctl.
Accesses to host abstract sockets and host loopback IPs (127.0.0.0/8) from containers are designed to be rejected.
However, it is probably possible to connect to host loopback IPs by exploiting TOCTOU
of struct sockaddr *
pointers.
-p 8080:80
cannot be connected to port 80
from other containers in the same network namespace-p 8080:80/udp
.
-p 8080:80
8080
when it handles bind(2) with target port 80
.8080
before container's process bind port 80