weaveworks / weave

Simple, resilient multi-host containers networking and more.
https://www.weave.works
Apache License 2.0
6.62k stars 671 forks source link

Very poor performance on Kubernetes with 100GbE network. #3966

Closed zkryakgul closed 1 year ago

zkryakgul commented 2 years ago

Hi, we are using ConnectX-5 100GbE ethernet cards on our servers which is connected one to each other trough the mellanox switch. And we are using weavenet cni plugin on our Kubernetes cluster. When we make some tests using iperf tool with the following command we get the 100Gbps connection speed in the host.

# server host
host1 $ iperf -s -P8
# client host
host2 $ iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

Also when we make some tests with the same tool and command using two docker containers on the same hosts we also get the same results.

# server host
host1$ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -s -P8 
# client host
host2 $ docker run -it -p 5001:5001 ubuntu:latest-with-iperf iperf -c <host_ip> -P8
Result: 98.8 Gbps transfer speed

But the when we create two diffrent deployment in the same hosts(host1,host2) with the same images and make the same test trough the service ip(we created a k8s service using the following yaml) which redirects traffic into the server pod we get the only 2Gbps. We also make the same test using the pod's cluster ip and the service's cluster domain but the results are same.

kubectl create deployment iperf-server --image=ubuntu:latest-with-iperf  # after that we add affinity(host1) and container port sections to the yaml
kubectl create deployment iperf-client --image=ubuntu:latest-with-iperf  # after that we add affinity(host2) and container port sections to the yaml
kind: Service
apiVersion: v1
metadata:
  name: iperf-server
  namespace: default
spec:
  ports:
    - name: iperf
      protocol: TCP
      port: 5001
      targetPort: 5001
  selector:
    name: iperf-server
  clusterIP: 10.104.10.230
  type: ClusterIP
  sessionAffinity: None

TLDR; The scenarios we tested:

We need to get the 100Gbps speed on pod-to-pod communication. So what could be causing this issue?

zkryakgul commented 1 year ago

We figure this out by disabling the encryption. But rebooting the server did the trick.