Open SkalaNetworks opened 1 month ago
Has anyone been able to reproduce the issue? I can't pinpoint what's causing it, the behaviour is extremely strange and variable. If anyone has got any clues of where I could look for problems during the ping outages, I'll gladly take it.
does your nat gw pod has the iptables rules after restarted?
Hi @bobz965
This is the iptables rules before the restart of the GW. Note there is one SNAT and a Floating IP, therefore 3 rules.
I kill the pod of the gateway and wait for it to restart, the ping ceases to work.
Here's the rules after:
do you have any other natgw pod, which has nothing (dnat snat fip) in the pod? just delete the pod.
Hi @bobz965
The entire cluster only has one NAT GW, I checkd with crictl ps
to check for "zombie" containers not tracked by K8S and I don't see any other gateway present
I also just checked for zombie processes running "ps aux | grep "sleep 10000" because that's what the GW is doing all day along, and none came up apart from my single and only NAT-GW.
Right now I have a test cluster where that gateway simply doesn't route any traffic.
Here's the iptables:
iptables seemed to make kube-ovn not work correctly
@SkalaNetworks What's the problem? Could you please provide some more details?
@zhangzujian I honestly didn't dig further but when my cluster (k0s) was installed with Kube-proxy in IPTables mode (it is now IPVS), I couldn't get ANY of the pods to ping each others, the CNI was entirely broken and I just switched to see if IPVS would work better. It immediatly resolved my issues, except for the one I'm writing about right now. I don't know if it might be some type of symptom.
I couldn't get ANY of the pods to ping each others
What address did you use? Did you ping the service ip or pod ip?
I tried pinging the IPs of each Pod in a custom VPC, basically they couldn't reach each other in any direction. I vaguely remember a lot of the kube-ovn components being not ready. I could try switching kube-proxy to iptables again and see if it all breaks, it doesn't cost me much to do it, natgateways are broken anyway
Observe the ping stop on the pod that was running it within the VPC
The problem is related to conntrack:
vpc-nat-gw-gw1-0:/kube-ovn# conntrack -p icmp -L
icmp 1 12 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=63 [UNREPLIED] src=192.168.73.1 dst=10.0.1.2 type=0 code=0 id=63 mark=0 use=1
icmp 1 29 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=64 src=192.168.73.1 dst=172.19.0.11 type=0 code=0 id=36062 mark=0 use=1
Packets of the stopped ping hit the first conntrack entry, which does not do SNAT.
Once you start a new ping, a new conntrack entry with SNAT (the second one above) will be created and work as expected.
There are two possible methods to fix it:
@bobz965 What do you think?
Do you think this might be related to my problem with the NAT-GW?
Do you think this might be related to my problem with the NAT-GW?
![]()
Seems the conntrack entry is performing SNAT. Does the ping still not receive replies?
I wish it did, out of 1892 packets, 27 were sent, the behaviour is extremely erratic, if you've got commands to debug what OVN is doing, I would be glad
NOTE: Ping works great directly from the GW
![]()
I wish it did, out of 1892 packets, 27 were sent, the behaviour is extremely erratic, if you've got commands to debug what OVN is doing, I would be glad
NOTE: Ping works great directly from the GW
how could you know the source packets to 1.1.1.1 are lost in vpc-nat-gw pod?
I can see them on the tcpdump in the gw (see before), something is happening between the gateway and the node. I don't know what, whether it's bridge related, OVN related, or iptables related, but the packets coming back from 1.1.1.1 are either:
I doubt the problem can be somewhere else as pinging directly from the GW works all the time. So I fail to see how it could be a connectivity issue beyond kube-ovn
Observe the ping stop on the pod that was running it within the VPC
The problem is related to conntrack:
vpc-nat-gw-gw1-0:/kube-ovn# conntrack -p icmp -L icmp 1 12 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=63 [UNREPLIED] src=192.168.73.1 dst=10.0.1.2 type=0 code=0 id=63 mark=0 use=1 icmp 1 29 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=64 src=192.168.73.1 dst=172.19.0.11 type=0 code=0 id=36062 mark=0 use=1
Packets of the stopped ping hit the first conntrack entry, which does not do SNAT.
Once you start a new ping, a new conntrack entry with SNAT (the second one above) will be created and work as expected.
There are two possible methods to fix it:
- Prevent serving traffic before routes and iptables rules are configured;
- Flush conntrack entries without SNAT/DNAT after routes and iptables rules are configured.
@bobz965 What do you think?
the image(step 10). i think the packets snat and dnat is normal. so the vpc nat gw pod snat and dnat is normal.
the dnat is happened, why you could not tcpdump the packets inside the pod? after all, we saw the dnat and snat
packets
@zhangzujian , I think you are right.
Observe the ping stop on the pod that was running it within the VPC
The problem is related to conntrack:
vpc-nat-gw-gw1-0:/kube-ovn# conntrack -p icmp -L icmp 1 12 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=63 [UNREPLIED] src=192.168.73.1 dst=10.0.1.2 type=0 code=0 id=63 mark=0 use=1 icmp 1 29 src=10.0.1.2 dst=192.168.73.1 type=8 code=0 id=64 src=192.168.73.1 dst=172.19.0.11 type=0 code=0 id=36062 mark=0 use=1
Packets of the stopped ping hit the first conntrack entry, which does not do SNAT. Once you start a new ping, a new conntrack entry with SNAT (the second one above) will be created and work as expected. There are two possible methods to fix it:
- Prevent serving traffic before routes and iptables rules are configured;
- Flush conntrack entries without SNAT/DNAT after routes and iptables rules are configured.
@bobz965 What do you think?
the image(step 10). i think the packets snat and dnat is normal. so the vpc nat gw pod snat and dnat is normal.
the dnat is happened, why you could not tcpdump the packets inside the pod? after all, we saw the
dnat and snat
packets
The screenshot you are quoting is AFTER the SNAT starts working again after some time for seemingly no specific reason. Which is why it is prefixed by this text:
Look at the (normal) traffic on the node hosting the gateway
ok, the nat gw pod disabled its arp until its routes and eip is ready.
The thing is, it sometimes takes several minutes or even sometimes never starts working again. Could there be a faulty logic in the ARP enabling mechanism making it deadlock?
Also what do you mean by "routes ready", does it wait for the iptables to be appended in the pod and for the EIP to be added on the interface before enabling ARP?
after nat-gw pod deleted and restarted.
routes ready
means net1 nic default routes is configured.
and then eip will be appended to net1, and then the arp is on. the net1 eip arping is accessible.
at first, the purpose of turn off the net1 arp is to make sure no arp proxy.
Kube-OVN Version
v1.13.0 and v1.12.x
Kubernetes Version
v1.28.3 on k0s Kube-proxy in IPVS mode (iptables seemed to make kube-ovn not work correctly)
Operation-system/Kernel Version
Debian GNU/Linux 12 (bookworm) 6.1.0-18-amd64
Description
The NatGateway ceases to ingress any traffic after being restarted. Pods using the NatGateway lose external connectivity.
Steps To Reproduce
Deploy the following YAML on the cluster with kube-ovn installed and multus installed
Ping 1.1.1.1 from the GW![image](https://github.com/kubeovn/kube-ovn/assets/127797154/a022068b-d4d0-42a1-97ed-3797eb1b5a7b)
Ping 1.1.1.1 from one of the 2 pods![image](https://github.com/kubeovn/kube-ovn/assets/127797154/cbb5f2b7-3a9e-44c1-9cae-7bb5640e19dc)
Observe (tcpdump) the traffic on the gateway while pinging![image](https://github.com/kubeovn/kube-ovn/assets/127797154/6cea21af-cd1a-4b2c-a577-b0545ffc4283)
Delete the nat-gateway pod and wait for the STS to Terminate and then Restart the pod Pinging continues to work while the pod is Terminating (as expected)
Observe the ping stop on the pod that was running it within the VPC
Observe the tcpdump on the gateway
No response is received
Pinging 1.1.1.1 from the gateway directly still works
SSH to the K8S node hosting the gateway pod and run a tcpdump![image](https://github.com/kubeovn/kube-ovn/assets/127797154/1e0f1a04-f58c-4d7c-a2ff-6d891a72f31d)
Wait a variable amount of time (10 minutes? sometimes 1 hour?) and rerun the ping from one of the test pods![image](https://github.com/kubeovn/kube-ovn/assets/127797154/8517433b-4250-4d12-8dc4-21138910d46f)
Look at the (normal) traffic on the node hosting the gateway![image](https://github.com/kubeovn/kube-ovn/assets/127797154/4c5c4362-f42c-4dfc-ba0f-0135eb86d008)
Current Behavior
The NatGateway ceases to work for random amounts of time (sometimes indefinitely) when restarted. Deleting everything (subnet, vpc, pods, gateway...) doesn't always fix the problem.
My tests are done WITHOUT the gateway moving from one node to another on restart (it's pinned to a node)
Expected Behavior
Restarting the pod leads to a connection downtime equal to the downtime of the pod.