Closed YHJ94 closed 1 month ago
Thanks for contacting @YHJ94 ,
I checked the logs and found no errors, it looks like a data path issue that needs further investigation.
It seems pod-A@gw-node@cluster1 -> vxlan-tunnel -> cluster2 is OK. But after that, I have no idea what went wrong. Also pod-A@none-gw-node@cluster1 -> gw-node -> vxlan-tunnel is NOT OK. Cannot detect any packets that pass through vxlan-tunnel.
We seem to have two different segments to troubleshoot
For '1' the packet a. VxLAN encaspualtes via vx-submariner interface (udp port 4800) to reach the GW node b. VxLAN decapsulates via vx-submariner c. VxLAN encaspualtes via vxlan-tunnel interface (udp port 4500) to reach the GW node in remote cluster
Tcpdumping vx-submariner and vxlan-tunnel can help us understand the root cause here
for '2' packet A. VxLAN decapsulates via vxlan-tunnel (udp port 4500) B. Calico should forward the packet to pod@non_gw_node
Also here tcpdumping the traffic on GW node can point us to the root cause
Do you have any security group in your Infra/Openstack cloud that might block inter-cluster traffic or submariner intra-cluster traffic (port 4800)? Do you have network policy defined in your clusters?
Thanks for your support @yboaron .
First, there's no network policies in my cluster and my security group already has UDP 4500, 4800 bidirectional rule. (Actually I allowed all TCP, UDP traffics between two clusters.)
And I tcpdumped as much as I can, here's what I got.
cluster1
kc get nodes -o wide --context cluster1
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP
cluster1-default-worker-node-0 Ready <none> 4d16h v1.29.3 192.168.0.6 <none>
cluster1-default-worker-node-1 Ready <none> 3d15h v1.29.3 192.168.0.22 <none>
kc get pods -o wide --context cluster1
NAME READY STATUS RESTARTS AGE IP NODE
curl-pod 1/1 Running 0 3d22h 10.100.111.12 cluster1-default-worker-node-0 # Simple busybox pod that request 'curl'
curl-pod2 1/1 Running 0 3d12h 10.100.79.16 cluster1-default-worker-node-1
cluster2
kc get nodes -o wide --context cluster2
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP
cluster2-default-worker-node-0 Ready <none> 4d16h v1.29.3 192.168.0.9 <none>
cluster2-default-worker-node-1 Ready <none> 3d15h v1.29.3 192.168.0.106 <none>
kc get svc,pod -o wide --context cluster2
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S)
service/hello-world ClusterIP 10.255.210.51 <none> 9090/TCP # Service port 9090 -> container port 8080
NAME READY STATUS RESTARTS AGE IP NODE
pod/hello-world-66c79b8cf7-dpgcp 1/1 Running 0 99m 10.110.216.74 cluster2-default-worker-node-1 # Simple 'hello world' pod that response some texts.
pod/hello-world-66c79b8cf7-lsxbq 1/1 Running 0 99m 10.110.33.72 cluster2-default-worker-node-0
tcpdump
a. VxLAN encaspualtes via vx-submariner interface (udp port 4800) to reach the GW node
sudo tcpdump -i vx-submariner port 9090
09:52:55.884832 IP cluster1-default-worker-node-1.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673371372 ecr 0,nop,wscale 7], length 0 09:52:56.907027 IP cluster1-default-worker-node-1.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673372395 ecr 0,nop,wscale 7], length 0 09:52:58.922983 IP cluster1-default-worker-node-1.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673374411 ecr 0,nop,wscale 7], length 0
09:52:55.886489 IP 240.168.0.22.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673371372 ecr 0,nop,wscale 7], length 0 09:52:56.908019 IP 240.168.0.22.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673372395 ecr 0,nop,wscale 7], length 0 09:52:58.924002 IP 240.168.0.22.60587 > 10.255.210.51.9090: Flags [S], seq 2522249798, win 65280, options [mss 1360,sackOK,TS val 673374411 ecr 0,nop,wscale 7], length 0
sudo tcpdump -i any udp port 4800
10:15:00.544936 eth0 Out IP cluster1-default-worker-node-1.41404 > 192.168.0.6.4800: UDP, length 82
10:15:00.547087 eth0 In IP 192.168.0.22.41404 > cluster1-default-worker-node-0.4800: UDP, length 82
- I assume this step is working properly.
<br>
> b. VxLAN decapsulates via vx-submariner
> c. VxLAN encaspualtes via vxlan-tunnel interface (udp port 4500) to reach the GW node in remote cluster
sudo tcpdump -i vxlan-tunnel port 9090
0 packets captured 0 packets received by filter 0 packets dropped by kernel
sudo tcpdump -i any udp port 4500
10:19:41.560211 eth0 In IP 192.168.0.9.33681 > cluster1-default-worker-node-0.ipsec-nat-t: UDP-encap: ESP(spi=0x08000000,seq=0x3e800), length 74 10:19:41.560331 eth0 Out IP cluster1-default-worker-node-0.49365 > 192.168.0.9.ipsec-nat-t: UDP-encap: ESP(spi=0x08000000,seq=0x3e800), length 74 ... ...
- There is no actual packets for a requested port.
- When I dump for udp 4500, I get bunch of these 'ipsec-nat-t' encapsulated packets. But I don't think they mean anything.
- **No packets pass through via vxlan-tunnel. Thus, there is no ingress on cluster2.**
curl-pod@gw-node@cluster1 --> hello-world-service(10.255.210.51:9090)
Results: OK
tcpdump
A. VxLAN decapsulates via vxlan-tunnel (udp port 4500)
sudo tcpdump -i vxlan-tunnel port 9090
10:32:16.666199 IP cluster1-default-worker-node-0.54890 > 10.255.210.51.9090: Flags [S], seq 4098193096, win 65280, options [mss 1360,sackOK,TS val 1634855176 ecr 0,nop,wscale 7], length 0 10:32:16.666961 IP 10.255.210.51.9090 > cluster1-default-worker-node-0.54890: Flags [S.], seq 1818495669, ack 4098193097, win 64704, options [mss 1360,sackOK,TS val 3091215469 ecr 1634855176,nop,wscale 7], length 0
10:32:16.666243 IP 241.168.0.6.54890 > 10.255.210.51.9090: Flags [S], seq 4098193096, win 65280, options [mss 1360,sackOK,TS val 1634855176 ecr 0,nop,wscale 7], length 0 10:32:16.666551 IP 10.255.210.51.9090 > 241.168.0.6.54890: Flags [S.], seq 1818495669, ack 4098193097, win 64704, options [mss 1360,sackOK,TS val 3091215469 ecr 1634855176,nop,wscale 7], length 0
sudo tcpdump -i any udp port 4500
10:37:16.915030 eth0 In IP 192.168.0.9.41008 > cluster1-default-worker-node-0.ipsec-nat-t: UDP-encap: ESP(spi=0x08000000,seq=0x3e800), length 259
10:37:16.914657 eth0 Out IP cluster2-default-worker-node-0.41008 > 192.168.0.6.ipsec-nat-t: UDP-encap: ESP(spi=0x08000000,seq=0x3e800), length 259
- The encapsulated packet contains requested data payload. So I think this step works fine.
<br>
> B. Calico should forward the packet to pod@non_gw_node
sudo tcpdump -i any host 192.168.0.106
10:44:26.606851 eth0 Out IP cluster2-default-worker-node-0.37899 > 192.168.0.106.4789: VXLAN, flags [I] (0x08), vni 4096 IP cluster2-default-worker-node-0.65474 > 10.110.216.74.http-alt: Flags [P.], seq 1:83, ack 1, win 510, options [nop,nop,TS val 1635585116 ecr 769177139], length 82: HTTP: GET / HTTP/1.1
sudo tcpdump -i any host 192.168.0.9
10:44:26.606735 eth0 In IP 192.168.0.9.37899 > cluster2-default-worker-node-1.4789: VXLAN, flags [I] (0x08), vni 4096 IP 10.110.33.0.65474 > 10.110.216.74.http-alt: Flags [.], ack 1, win 510, options [nop,nop,TS val 1635585116 ecr 769177139], length 0
- Packet forwarding via calico vxlan is working properly.
Sorry for the late response,
Can you please try setting the cable driver to libreswan and see if that helps?
You can add below flags [1] to subctl join command to set the cable driver to libreswan
[1] --cable-driver libreswan --force-udp-encaps
@YHJ94 , Any update on this issue ?
@yboaron , So sorry. I was not able to test it due to my current situation. I will try it out and let you know as soon as possible. Thanks.
@YHJ94 - closing this issue for now. Feel free to reopen if you still need any help
Backgroud
I have two clusters:
CNI: Calico VxLAN
Submariner Cable Driver: VXLAN
What happened
I successfully setup submariner between two clusters with single worker node. So It works fine when pods are located at the same node with the gateway.
The problem occurs after I scaled out my worker node.
Here's some examples I'm facing on.
It's very weird that I'm able to access to none-gw-pods only when I'm calling service domain(service discovery) from gw-pods.
What you expected to happen
Connections should succeed between all pods, regardless of whether I use a service domain or pod IP.
Anything else we need to know?
I have trouble-shooted as much as I could.
So I'm suspecting that the traffic between none-gw-pod and the actual gateway is not working well.
Environment
Diagnose information (use
subctl diagnose all
):Gather information (use
subctl gather
): cluster1.zip cluster2.zipFirewall Check
subctl verify --context cluster1 --tocontext cluster2 --only service-discovery,connectivity --verbose
Cloud provider or hardware configuration:
Install tools:
Others: