rootless-containers / bypass4netns

[Experimental] Accelerates slirp4netns using SECCOMP_IOCTL_NOTIF_ADDFD. As fast as `--net=host`.
https://medium.com/nttlabs/accelerating-rootless-container-network-29d0e908dda4
Apache License 2.0
126 stars 6 forks source link

multinode: watch Kubernetes service resources instead of depending on etcd #56

Open AkihiroSuda opened 6 months ago

AkihiroSuda commented 6 months ago

Deploying etcd is hard, so maybe we should just watch Kubernetes services instead.

cc @naoki9911

AkihiroSuda commented 6 months ago

IIUC, the multi-node bypass will only work for Pod-to-NodePort communications, so probably we will just need to watch .spec.podCIDR and add it to the bypass4netnsd --ignore list?

For Pod-to-Pod communications, we will need a (existing) userspace service mesh that multiplexes pod IPs to a single NodePort as a proxy. This will cause some overhead, but with bypass4netns it will be still better than slirp4netns+TAP+VXLAN.

@naoki9911 Let me know if I'm missing something.

naoki9911 commented 6 months ago

Theoretically, the multi-node bypass can handle pod-to-pod communications without the proxy. bypass4netns can expose the pod's ports on the node by bypassing the socket when bind(2) called, and other pods can connect to the exposed ports. But, I think this approach is not elegant. We need to handle all connect(2) calls to rewrite their destination address to node's IP and ports. Also, this approach consumes many ports on the node and other nodes other than pods can connect to the exposed ports.

I think your multiplexing proxy approach is bettter. The following procedure will enable the approach for SOCK_STREAM. When the pod's connect(2) is handled, connecting to the proxy and sending destination information in the handler. Multiplexing proxy reads the destination information and connects to the destination pod.

But, applying the same approach for SOCK_DGRAM will be difficult. The multiplexer will cause huge performance degradation.

AkihiroSuda commented 6 months ago

bypass4netns can expose the pod's ports on the node by bypassing the socket when bind(2) called, and other pods can connect to the exposed ports.

Yes, but it might be insecure to directly expose bare pod ports to other nodes, and it is hard to handle port number conflicts across pods