rootless-containers / bypass4netns

[Experimental] Accelerates slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD. As fast as `--net=host`.
https://medium.com/nttlabs/accelerating-rootless-container-network-29d0e908dda4
Apache License 2.0
131 stars 6 forks source link

bypass4netns: Accelerator for slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD (Kernel 5.9)

bypass4netns is as fast as --net=host and almost as secure as traditional slirp4netns.

The current version of bypass4netns needs to be used in conjunction with slirp4netns, however, future version may work without slirp4netns.

Benchmark

(Oct 16, 2020)

Workload: iperf3 -c HOST_IP from podman run

How it works

bypass4netns eliminates the overhead of slirp4netns by trapping socket syscals and executing them in the host network namespace using SECCOMP_IOCTL_NOTIF_ADDFD.

See also the talks.

Requirements

Build-time requirement:

Compile

make
sudo make install

The following binaries will be installed into /usr/local/bin:

Usage

Hard way (docker|podman|nerdctl)

$ bypass4netns --ignore="127.0.0.0/8,10.0.0.0/8,auto" -p="8080:80"

--ignore=... is a list of the CIDRs that cannot be bypassed:

$ ./test/seccomp.json.sh >$HOME/seccomp.json
$ $DOCKER run -it --rm --security-opt seccomp=$HOME/seccomp.json --runtime=runc alpine

$DOCKER is either docker, podman, or nerdctl.

Easy way (nerdctl)

bypass4netns is experimentally integrated into nerdctl (>= 0.17.0).

containerd-rootless-setuptool.sh install-bypass4netnsd
nerdctl run -it --rm -p 8080:80 --annotation nerdctl/bypass4netns=true alpine

NOTE: nerdctl prior to v2.0 needs --label instead of --annotation. Also, the syntax will be probably replaced with --security-opt or something like --network-opt in a future version of nerdctl.

:warning: Caveats :warning:

Accesses to host abstract sockets and host loopback IPs (127.0.0.0/8) from containers are designed to be rejected.

However, it is probably possible to connect to host loopback IPs by exploiting TOCTOU of struct sockaddr * pointers.

TODOs

Publications