Closed vsoch closed 4 months ago
Some experiments are made here to accelerate VXLAN, but not integrated to flannel yet https://github.com/naoki9911/bypass4netns/commit/42890b632fdb48cc3f4209718fdd359df3b401bc
@AkihiroSuda so you would say the slowness isn't isn't mtu, but still something related to slirp4netns (could you tell me in layman's terms the issue)? I am putting together a talk and want to mention why the network was slow, at least at a high level. If it's just the mtu value I could also try the experiments again. I'm trying to understand why it's slow.
It is still worth trying to adjust MTU, but probably won't reach 10 Gbps, as the usermode TCP/IP (libslirp) is quite slow by nature.
You may also refer to https://pibvt.net/IPSJ-OS22156009.pdf to grab the design of slirp4netns, and how its overhead can be eliminated with bypass4netns. This revision does not support VXLAN though.
okay confirmed the default is indeed 1450:
# cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
Trying to understand how to change it next.
@AkihiroSuda could it be mtu related if we are also using the network with mtu 1500 for the comparison case? For example, see the middle one here:
And then that flux is using it for the overlay network (see the name under default bind)
If they both are 1500 (but flux is still much faster) that probably can't be the causative factor.
You may rather try to decrease this MTU https://github.com/moby/moby/blob/afc7e581e601d53d3b346828bd94ac1fed1e226d/contrib/dockerd-rootless.sh#L13
Should be good here thanks for the help!
Hey @AkihiroSuda ! I was doing some tests on my tiny cluster and noticed quite a bit of slowness with usernetes vs. without - here is a simple run of iperf3 for two metrics:
I had this assumption in my head that slirp4netns was to blame - I read something somewhere about it (that I can't find now) that it would be a bottleneck (is that true)? But then I was looking here: https://github.com/flannel-io/flannel/blob/e8fb8108622bb9646dc7de84df19adbae319acb8/pkg/subnet/subnet.go#L104 and it seems the default mtu value is 1450 - could that explain the slowness (or something else)? Is there a way to make the usernetes network performance equal to the bare metal? Thank you!