kubeslice / worker-operator

Kubeslice Worker Operator Opensource Repository: The KubeSlice Worker Operator is a Kubernetes operator that manages the lifecycle of KubeSlice worker clusters.
Apache License 2.0
58 stars 19 forks source link

Bug: Exported services suffer significant traffic loss even if some instance of backend services are available #384

Closed mridulgain closed 1 month ago

mridulgain commented 1 month ago

📜 Description

For the services managed by hpa, when the service is scaled up and down, there is a certain time before the changes are propagated to the whole slice. In those cases we notice a 503 because the ip returned by cmd-nsc is not in place anymore.

👟 Reproduction steps

  1. Create slice; overlay network deployment mode = "single-network"
  2. Onboard application namespace
  3. Export service from said application namespace
  4. Scale up/down the backend pods

👍 Expected behavior

The service export reconciler should react to the application pod change.

👎 Actual Behavior

when the service is scaled up and down, there is a certain time before the changes are propagated to the whole slice

🐚 Relevant log output

Client connecting to iperf-server.iperfns.svc.slice.local, TCP port 5201
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.190.16.4 port 60034 connected with 10.190.0.3 port 5201
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-1.00 sec  1.37 MBytes  11.5 Mbits/sec
[  1] 1.00-2.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 2.00-3.00 sec  1.25 MBytes  10.5 Mbits/sec
write failed: Connection reset by peer
shutdown failed: Socket not connected
[  1] 3.00-3.60 sec   767 KBytes  10.5 Mbits/sec
[  1] 0.00-3.60 sec  4.62 MBytes  10.8 Mbits/sec
------------------------------------------------------------
Client connecting to iperf-server.iperfns.svc.slice.local, TCP port 5201
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
tcp connect failed: Operation timed out
[  1] local 0.0.0.0 port 0 connected with 10.190.0.3 port 5201
------------------------------------------------------------
Client connecting to iperf-server.iperfns.svc.slice.local, TCP port 5201
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 10.190.16.4 port 55180 connected with 10.190.0.5 port 5201
[ ID] Interval       Transfer     Bandwidth
[  1] 0.00-1.00 sec  1.37 MBytes  11.5 Mbits/sec
[  1] 1.00-2.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 2.00-3.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 3.00-4.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 4.00-5.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 5.00-6.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 6.00-7.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 7.00-8.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 8.00-9.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 9.00-10.00 sec  1.25 MBytes  10.5 Mbits/sec
[  1] 0.00-10.14 sec  12.7 MBytes  10.5 Mbits/sec
------------------------------------------------------------

Version

No response

🖥️ What operating system are you seeing the problem on?

Linux

✅ Proposed Solution

Currently the service export reconciler is invoked every 30 second. It should be an event driven trigger instead of periodic one.

👀 Have you spent some time to check if this issue has been raised before?

Code of Conduct