microsoft / Freeflow

High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application code/binary.
MIT License
597 stars 88 forks source link

Why the Bandwidth doesn't change? #1

Closed gangliao closed 6 years ago

gangliao commented 6 years ago

Freeflow

TCP Physical Bandwidth

ethtool eth0 | grep Speed
#   Speed: 1000Mb/s

demo test

Baseline with Flannel

10.141.162.80:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
ip addr show  # 172.30.81.4
iperf3 -s

10.141.170.36:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
iperf3 -c 172.30.81.4
Connecting to host 172.30.81.4, port 5201
[  4] local 172.30.64.3 port 34259 connected to 172.30.81.4 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   111 MBytes   932 Mbits/sec    0   1.58 MBytes
[  4]   1.00-2.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   2.00-3.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   3.00-4.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   4.00-5.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   5.00-6.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   8.00-9.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   9.00-10.00  sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.06 GBytes   911 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.06 GBytes   909 Mbits/sec                  receiver

FreeFlow

10.141.186.119:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.92.0/24" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

ip addr show  # 172.30.92.18
iperf3 -s

10.141.186.118:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.108.0/24" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

iperf3 -c 172.30.92.18
[  4] local 172.30.108.12 port 38826 connected to 172.30.92.18 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   111 MBytes   933 Mbits/sec    0   1.50 MBytes
[  4]   1.00-2.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   2.00-3.00   sec   108 MBytes   902 Mbits/sec    0   1.50 MBytes
[  4]   3.00-4.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   4.00-5.00   sec   108 MBytes   902 Mbits/sec    0   1.50 MBytes
[  4]   5.00-6.00   sec   109 MBytes   912 Mbits/sec    0   1.50 MBytes
[  4]   6.00-7.00   sec   108 MBytes   902 Mbits/sec    0   1.57 MBytes
[  4]   7.00-8.00   sec   109 MBytes   912 Mbits/sec    0   1.57 MBytes
[  4]   8.00-9.00   sec   108 MBytes   902 Mbits/sec    0   1.57 MBytes
[  4]   9.00-10.00  sec   109 MBytes   912 Mbits/sec    0   1.57 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.06 GBytes   910 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.06 GBytes   908 Mbits/sec                  receiver
danyangz commented 6 years ago

Freeflow improves bandwidth only when the bottleneck is in the operating system kernel. In your case, it seems that you have an external rate limit set at 1Gbps. Freeflow will not improve the throughput.

On the other hand, you should observe lower network latency and lower CPU utilization even when throughput is capped at 1Gbps.

gangliao commented 6 years ago

@danyangz Thanks. You are right!

Docker with Flannel:

CPU Utilization: local/sender 2.4% (0.2%u/2.2%s), remote/receiver 3.5% (0.4%u/3.2%s)

FreeFlow:

CPU Utilization: local/sender 1.8% (0.1%u/1.7%s), remote/receiver 0.0% (0.0%u/0.0%s)
bobzhuyb commented 6 years ago

Danyang is right -- Freeflow can't help you exceed your hardware limit. Get a pair of servers with 10Gbps or 40Gbps connection, you'll start to see the difference.

By the way, VNET_PREFIX should include both sender and receiver, i.e., cover both 172.30.92.18 and 172.30.108.12 in your example. You may just use 172.30.0.0/16

gangliao commented 6 years ago

@bobzhuyb

Thanks. After changing the VNET_PREFIX, I can see a little bit of improvement using 1Gbps connection.

Freeflow

TCP Physical Bandwidth

ethtool eth0 | grep Speed
#   Speed: 1000Mb/s

demo test

bare metal

10.141.162.80:

yum install -y iperf3
iperf3 -s

10.141.170.36:

yum install -y iperf3
iperf3 -c 10.141.162.80 -V
Connecting to host 10.141.162.80, port 5201
[  4] local 10.141.170.36 port 51292 connected to 10.141.162.80 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   114 MBytes   958 Mbits/sec    0    397 KBytes
[  4]   1.00-2.00   sec   112 MBytes   942 Mbits/sec    0    397 KBytes
[  4]   2.00-3.00   sec   112 MBytes   942 Mbits/sec    0    397 KBytes
[  4]   3.00-4.00   sec   112 MBytes   942 Mbits/sec    0    417 KBytes
[  4]   4.00-5.00   sec   111 MBytes   935 Mbits/sec    0    417 KBytes
[  4]   5.00-6.00   sec   113 MBytes   944 Mbits/sec    0    417 KBytes
[  4]   6.00-7.00   sec   113 MBytes   945 Mbits/sec    0    417 KBytes
[  4]   7.00-8.00   sec   112 MBytes   942 Mbits/sec    0    417 KBytes
[  4]   8.00-9.00   sec   111 MBytes   935 Mbits/sec    0    443 KBytes
[  4]   9.00-10.00  sec   112 MBytes   941 Mbits/sec    0    443 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver
CPU Utilization: local/sender 2.2% (0.1%u/2.1%s), remote/receiver 3.1% (0.2%u/2.8%s)

Baseline with Flannel

10.141.162.80:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
ip addr show  # 172.30.81.4
iperf3 -s

10.141.170.36:

sudo docker run -it --entrypoint /bin/bash --name iperf networkstatic/iperf3
iperf3 -c 172.30.81.4 -V
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec   112 MBytes   936 Mbits/sec    0   1.58 MBytes
[  4]   1.00-2.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   2.00-3.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   3.00-4.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   4.00-5.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   5.00-6.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   6.00-7.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   7.00-8.00   sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
[  4]   8.00-9.00   sec   109 MBytes   912 Mbits/sec    0   1.58 MBytes
[  4]   9.00-10.00  sec   108 MBytes   902 Mbits/sec    0   1.58 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  1.06 GBytes   910 Mbits/sec    0             sender
[  4]   0.00-10.00  sec  1.06 GBytes   909 Mbits/sec                  receiver
CPU Utilization: local/sender 2.9% (0.1%u/2.8%s), remote/receiver 0.8% (0.1%u/0.7%s)

FreeFlow

10.141.186.119:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.0.0/16" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

ip addr show  # 172.30.92.18
iperf3 -s

10.141.186.118:

sudo docker run -d -it --privileged --net=host -v /freeflow:/freeflow -e "HOST_IP_PREFIX=10.141.184.0/21" --name freeflow freeflow/freeflow:tcp

sudo docker run -it --entrypoint /bin/bash -v /freeflow:/freeflow -e "VNET_PREFIX=172.30.0.0/16" -e "LD_PRELOAD=/freeflow/libfsocket.so" --name iperf networkstatic/iperf3

iperf3 -c 172.30.92.18 -V
Starting Test: protocol: TCP, 1 streams, 131072 byte blocks, omitting 0 seconds, 10 second test
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  5]   0.00-1.00   sec   114 MBytes   954 Mbits/sec    0   13.7 KBytes
[  5]   1.00-2.00   sec   112 MBytes   943 Mbits/sec    0   13.7 KBytes
[  5]   2.00-3.00   sec   113 MBytes   946 Mbits/sec    0   13.7 KBytes
[  5]   3.00-4.00   sec   112 MBytes   936 Mbits/sec    0   13.7 KBytes
[  5]   4.00-5.00   sec   113 MBytes   948 Mbits/sec    0   13.7 KBytes
[  5]   5.00-6.00   sec   112 MBytes   943 Mbits/sec    0   13.7 KBytes
[  5]   6.00-7.00   sec   112 MBytes   938 Mbits/sec    0   13.7 KBytes
[  5]   7.00-8.00   sec   111 MBytes   935 Mbits/sec    0   13.7 KBytes
[  5]   8.00-9.00   sec   113 MBytes   948 Mbits/sec    0   13.7 KBytes
[  5]   9.00-10.00  sec   112 MBytes   939 Mbits/sec    0   13.7 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
Test Complete. Summary Results:
[ ID] Interval           Transfer     Bandwidth       Retr
[  5]   0.00-10.00  sec  1.10 GBytes   943 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.10 GBytes   941 Mbits/sec                  receiver
CPU Utilization: local/sender 1.6% (0.2%u/1.5%s), remote/receiver 3.6% (0.4%u/3.2%s)
bobzhuyb commented 6 years ago

This is the correct behavior. Freeflow TCP should give you exactly the same performance as bare metal. The improvement over Flannel will be larger if you have faster network hardware.

gangliao commented 6 years ago

@bobzhuyb Any plan to support general RDMA solution under Kubernetes. I found it's hardcode now. It's impossible to deploy it on Kubernetes since PODs can be re-scheduled on the different machine. PODs IP will be changed in that case.

gangliao commented 6 years ago

Benchmarks:

freeflow-tcp-intra-node.md

freeflow-tcp-2-nodes.md

lampson0505 commented 6 years ago

@gangliao Could you be more specific? Do you mean the VIP to Host IP mapping will be changed? You can maintain this mapping into KV stores like ZooKeeper. Of course, we do have a plan to add this feature, but your contribution will also be welcomed.

bobzhuyb commented 6 years ago

@gangliao Freeflow aims to be a general solution for all kinds of container orchestrators (k8s, mesos, yarn, etc.) and overlay solutions based on different key-value storages (etcd, zookeeper, etc.). This is part of the reason why we don't integrate with a specific solution for now.

We may add example implementation in the future, but it's not possible to cover all the combinations that different developers pick. So we expect developers to integrate with their own choices of framework/services, and contribute to Freeflow if possible :)

gangliao commented 6 years ago

I will happy to. Thanks.