Open ac-hibbert opened 4 years ago
Related: #3327
I found similar log entries (but for calico), googled for a solution, and found this issue.
I could resolve the problem by changing the network ranges. We were using the 192.168.x range. In this case here the range 100.120.0.0/10 is used. Both ranges are special IP ranges...
They are "special" in the sense that they don't route on the public Internet, which makes them good for use inside Weave Net.
Sure, I didn't dig deeper, because I was happy that it finally worked. I have installed Cisco Container Platform 8.0.0 on a nested vSphere environment and just tried it with this trick. All of the sudden I had no issues with martian packages anymore and stopped my research of the root cause.
If you have a better explanation for this behavior, I would be grateful to learn more!
Could you say what exactly worked? You said you changed the range, and what it was before, but not what you changed it to.
We don't have any explanation, which is why this issue and #3327 are open.
before we had the following setting:
pod network: 192.169.0.0/16 cidr node network: "192.168.200.0/24" gateway_ip: "192.168.200.254"
now we have:
pod network: 192.168.0.0/16 node network: "10.98.0.0/20" gateway_ip: "10.98.0.1"
but I have to double check the first pod network CIDR....
As you can see, we also use the private IP range as container network. I am not sure if it happens because both ranges are private.... maybe this routing suppression is also active on OS level...
This are just some thoughts on top of my head. Maybe you can easily invalidate this explanation by switching the network ranges for your tests as I did it.
Any updates to this issue?
We see intermittently similar entries in the EC2 instance's syslog and networking doesn't work correctly within the pods (usually notice because name resolution doesn't work).
When this happens for us on EKS, we end up having to rotate the node (sometimes go through several new instances) before traffic works in pods. A new EC2 instance seems to be the only way to get a fully functioning node.
What you expected to happen?
Not having the kernel logs filled with martian source messages
What happened?
Kernel logs have martian source messages
ip addr
outputsysctl
outputHow to reproduce it?
Anything else we need to know?
EKS
Versions:
Logs:
or, if using Kubernetes:
Network: