Open AkihiroSuda opened 8 months ago
IIUC, the multi-node bypass will only work for Pod-to-NodePort communications, so probably we will just need to watch .spec.podCIDR
and add it to the bypass4netnsd --ignore
list?
For Pod-to-Pod communications, we will need a (existing) userspace service mesh that multiplexes pod IPs to a single NodePort as a proxy. This will cause some overhead, but with bypass4netns it will be still better than slirp4netns+TAP+VXLAN.
@naoki9911 Let me know if I'm missing something.
Theoretically, the multi-node bypass can handle pod-to-pod communications without the proxy.
bypass4netns can expose the pod's ports on the node by bypassing the socket when bind(2)
called, and other pods can connect to the exposed ports.
But, I think this approach is not elegant.
We need to handle all connect(2)
calls to rewrite their destination address to node's IP and ports.
Also, this approach consumes many ports on the node and other nodes other than pods can connect to the exposed ports.
I think your multiplexing proxy approach is bettter.
The following procedure will enable the approach for SOCK_STREAM
.
When the pod's connect(2)
is handled, connecting to the proxy and sending destination information in the handler.
Multiplexing proxy reads the destination information and connects to the destination pod.
But, applying the same approach for SOCK_DGRAM
will be difficult.
The multiplexer will cause huge performance degradation.
bypass4netns can expose the pod's ports on the node by bypassing the socket when bind(2) called, and other pods can connect to the exposed ports.
Yes, but it might be insecure to directly expose bare pod ports to other nodes, and it is hard to handle port number conflicts across pods
Deploying etcd is hard, so maybe we should just watch Kubernetes services instead.
cc @naoki9911