networkservicemesh / deployments-k8s

Apache License 2.0
41 stars 32 forks source link

Loadbalancer example stops working when the deployment is scaled up. #12068

Open demian91 opened 1 month ago

demian91 commented 1 month ago

Related to vl3-lb feature: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/features/vl3-lb

Expected Behavior

The traffic between the server and the client should not be affected by the addition of the new server (e.g., scaling up of the finance deployment). I would also expect that switching to iperf traffic (as in changing the image of the server from http-echo to net-tools) will have no effect as longs as the ports are the same and match the network service description.

Current Behavior

TCP traffic between the server and client stops when the deployment is scaled up and a new server is added.

Failure Information (for bugs)

TCP traffic between the server and client stops when the deployment is scaled up and a new server is added.

Steps to Reproduce

  1. install spire via:
    kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/spire/single_cluster?ref=a070c14f871cb904f375727aa61f4cd031abc8a7
    kubectl apply -f https://raw.githubusercontent.com/networkservicemesh/deployments-k8s/a070c14f871cb904f375727aa61f4cd031abc8a7/examples/spire/single_cluster/clusterspiffeid-template.yaml
    kubectl apply -f https://raw.githubusercontent.com/networkservicemesh/deployments-k8s/a070c14f871cb904f375727aa61f4cd031abc8a7/examples/spire/base/clusterspiffeid-webhook-template.yaml

link: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/spire/single_cluster

  1. install NSM:

    kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/basic?ref=a070c14f871cb904f375727aa61f4cd031abc8a7

    link: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/basic

  2. [working example] install Loadbalancer feature example:

    kubectl apply -k https://github.com/networkservicemesh/deployments-k8s/examples/features/vl3-lb?ref=a070c14f871cb904f375727aa61f4cd031abc8a7

link: https://github.com/networkservicemesh/deployments-k8s/tree/main/examples/features/vl3-lb

Output as expected:

$ kubectl exec deployments/finance-client -n ns-vl3-lb -- curl -s finance.vl3-lb:8080 | grep "Hello! I'm finance-server"
Defaulted container "nettools" out of: nettools, cmd-nsc, cmd-nsc-init (init)
Hello! I'm finance-server-6b9d477dbc-rmsmd

$ kubectl exec deployments/finance-client -n ns-vl3-lb -- curl -s finance.vl3-lb:8080 | grep "Hello! I'm finance-server"
Defaulted container "nettools" out of: nettools, cmd-nsc, cmd-nsc-init (init)
Hello! I'm finance-server-6b9d477dbc-xg75j

$ kubectl exec deployments/finance-client -n ns-vl3-lb -- curl -s finance.vl3-lb:8080 | grep "Hello! I'm finance-server"
Defaulted container "nettools" out of: nettools, cmd-nsc, cmd-nsc-init (init)
Hello! I'm finance-server-6b9d477dbc-rmsmd

Scaling up and down the server deployment was successful.

  1. [not working example] install loadbalancer feature example with iperf instead of http-echo. The used ports are the same, as well as the traffic (TCP). The goal was to have a longer connection.
$ git diff                           
diff --git a/examples/features/vl3-lb/finance-server.yaml b/examples/features/vl3-lb/finance-server.yaml       
index b7a38386186..31e8aef8534 100644                                                                          
--- a/examples/features/vl3-lb/finance-server.yaml                                                             
+++ b/examples/features/vl3-lb/finance-server.yaml                                                             
@@ -18,14 +18,16 @@ spec:                                                                                      
     spec:                                                                                                     
       containers:                                                                                             
         - name: nginx                                                                                         
-          image: hashicorp/http-echo:1.0                                                                      
+          #image: hashicorp/http-echo:1.0                                                                     
+          image: aeciopires/nettools:1.0.0                                                                    
+          command: ["/bin/bash", "-c", "iperf3 -s -p 8081"]                                                   
           env:                                                                                                
           - name: POD_NAME                                                                                    
             valueFrom:                                                                                        
               fieldRef:                                                                                       
                 fieldPath: metadata.name                                                                      
-          args:                                                                                               
-            - "-text=Hello! I'm $(POD_NAME)"                                                                  
-            - -listen=:8081                                                                                   
+          #args:                                                                                              
+          #  - "-text=Hello! I'm $(POD_NAME)"                                                                 
+          #  - -listen=:8081                                                                                  
           ports:                                                                                              
             - containerPort: 8081                                                                             

When I try running iperf3 client to connect to one of the servers, I get mixed results. 9/10 times, iperf established a connection but doesnt actually send any data and just "hangs" (see bellow - not working output). Sometimes, it does connect, and I get the expected output 1Mbits/sec (see bellow - working output)

[not working output]

finance-client-765fdddd77-jx6hl:/# iperf3 -c finance.vl3-lb -p 8080 -b 1M -t 1000
Connecting to host finance.vl3-lb, port 8080
[  5] local 172.16.0.6 port 48586 connected to 172.16.0.2 port 8080

^C[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-13891.32 sec  0.00 Bytes  0.00 bits/sec    0   87.4 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-13891.32 sec  0.00 Bytes  0.00 bits/sec    0             sender
[  5]   0.00-13891.32 sec  0.00 Bytes  0.00 bits/sec                  receiver
iperf3: interrupt - the client has terminated

[working output]

finance-client-765fdddd77-jx6hl:/# iperf3 -c finance.vl3-lb -p 8080 -b 1M -t 1000
Connecting to host finance.vl3-lb, port 8080
[  5] local 172.16.0.6 port 49280 connected to 172.16.0.2 port 8080
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes
[  5]   1.00-2.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes
[  5]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes
[  5]   3.00-4.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec    0    149 KBytes
^C[  5]   5.00-5.60   sec   128 KBytes  1.76 Mbits/sec    0    140 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-5.60   sec   768 KBytes  1.12 Mbits/sec    0             sender
[  5]   0.00-5.60   sec  0.00 Bytes  0.00 bits/sec                  receiver

4.1 Scaling while the traffic is running. I tried to scale up the deployment, while the traffic was running correctly between one of the finance-servers and the finance-client, using the command bellow (to go from 3 to 4 servers).

kubectl scale deployment finance-server -n ns-vl3-lb --replicas=4

The moment the 4th finance server started running, the iperf reported 0bits/sec (see below).

finance-client-765fdddd77-jx6hl:/# iperf3 -c finance.vl3-lb -p 8080 -b 1M -t 1000      
Connecting to host finance.vl3-lb, port 8080                                           
[  5] local 172.16.0.6 port 58198 connected to 172.16.0.2 port 8080                    
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd                       
[  5]   0.00-1.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]   1.00-2.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]   2.00-3.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]   3.00-4.00   sec   128 KBytes  1.05 Mbits/sec    0    149 KBytes                
[  5]   4.00-5.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]   5.00-6.00   sec   128 KBytes  1.04 Mbits/sec    0    140 KBytes                
[  5]   6.00-7.00   sec   128 KBytes  1.05 Mbits/sec    0    149 KBytes                
[  5]   7.00-8.00   sec   128 KBytes  1.05 Mbits/sec    0    149 KBytes                
[  5]   8.00-9.00   sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]   9.00-10.00  sec   128 KBytes  1.05 Mbits/sec    0    140 KBytes                
[  5]  10.00-11.00  sec   128 KBytes  1.05 Mbits/sec    1   8.74 KBytes                
[  5]  11.00-12.00  sec   128 KBytes  1.05 Mbits/sec    2   8.74 KBytes                
[  5]  12.00-13.00  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes                  
[  5]  13.00-14.00  sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes                  
[  5]  14.00-15.00  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes                  
[  5]  15.00-16.00  sec  0.00 Bytes  0.00 bits/sec    0   8.74 KBytes                  
[  5]  16.00-17.00  sec  0.00 Bytes  0.00 bits/sec    1   8.74 KBytes                  

Context

The setup is running a singular k8s cluster with only one node. These are the only deployments running in the cluster.