Closed a-sorokin-sdg closed 2 months ago
killing calico-node pod immediately fixing the problem
What changes after killing the pod. Could you share your routing table before/after?
Did you have ctlb disabled before upgrade?
Did you have ctlb disabled before upgrade?
Nope Can you provide additional parameters which I need to set up? I think it is something with
bpfConnectTimeLoadBalancingEnabled bpfConnectTimeLoadBalancing bpfHostNetworkedNATWithoutCTLB
It's all not set up currently
killing calico-node pod immediately fixing the problem
What changes after killing the pod. Could you share your routing table before/after?
Routes almost did not change I am checkins via dns resolviong against kube-dns service and it is there after reboot (when it is not working) and after killing pod (when it is working).
10.243.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
killing calico-node pod immediately fixing the problem
What changes after killing the pod. Could you share your routing table before/after?
Routes almost did not change I am checkins via dns resolviong against kube-dns service and it is there after reboot (when it is not working) and after killing pod (when it is working).
10.243.0.10 169.254.1.1 255.255.255.255 UGH 0 0 0 bpfin.cali
That route is correct. Since iirc 3.27 we route traffic from host to UDP service via that device by default.
I wonder is some routes caching is in play. Could you dump ip route show cached
?
I wonder is some routes caching is in play. Could you dump
ip route show cached
?
ip route show cached
it is empty before and after killing the pod
BTW Apparently, I have a problem only with services pinned in the route table And apparently, they are all UDP services (dns services) There are no tcp services in the route table and they are working fine (tested kube api service via service IP and nginx ingress)
Just to confirm, do you see the same problem from host-networked pods/processes or from regular pods as well?
I tried to reproduce the issue, I created a cluster in gcp with kubeadm and installed calico 3.26.4 and upgraded to 3.28 and my DNS did worked just fine.
Would you be able to tcpdump whether your traffic is reaching the service, what kind of packets are exiting from bpfout.cali
on your test node?
If your cluster is not a production cluster we could dig deeper with enabling bpf logging to get some more useful logs. Ideally we could sync at calico users slack.
Just to confirm, do you see the same problem from host-networked pods/processes or from regular pods as well?
Regular pods don't have a problem. They work fine. Tested it to be sure.
So only host-networked pods/processes have problem
Would you be able to tcpdump whether your traffic is reaching the service, what kind of packets are exiting from
bpfout.cali
on your test node?
So fresh vm node with single interface joined to cluster I tested with
nslookup openebs-api-rest.openebs.svc.l8s.local. 10.243.0.10
When it's working well after calico-node pod kill bpfout.workin.pcap.gz
That doesn't seem to be a problem :arrow_up: Do you see packets returning to the client in both cases? 10.243.0.10
is a local pod or remote?
you could also enable :arrow_down: in default felixconfiguration
and provide bpf logs from the node using bpftool prog tracelog > log.txt
for the case when it does not work. That should give us good insight.
bpfLogLevel: Debug
bpfLogFilters:
- all: host 172.24.1.29 and udp port 53
10.243.0.10
is a local pod or remote?
it's a service IP pods behind that IP are remote hostNetwork -> pod IP has no issue
bpfLogLevel: Debug bpfLogFilters:
- all: host 172.24.1.29 and udp port 53
that does not work these changes have been accepted by API
bpfLogLevel: Debug
bpfLogFilters:
all: host 172.24.1.29 and udp port 53
however bpfLogFilters property disappeared from the object so anyway here is a log probably not filtered
BTW I found a repeated error in the tigera operator probably not related to this issue
{"level":"error","ts":"2024-06-12T08:07:07Z","logger":"controller_ippool","msg":"Cannot update an IP pool not owned by the operator","Request.Namespace":"","Request.Name":"periodic-5m0s-reconcile-event","reason":"ResourceValidationError","stacktrace":"github.com/tigera/operator/pkg/controller/status.(*statusManager).SetDegraded\n\t/go/src/github.com/tigera/operator/pkg/controller/status/status.go:356\ngithub.com/tigera/operator/pkg/controller/ippool.(*Reconciler).Reconcile\n\t/go/src/github.com/tigera/operator/pkg/controller/ippool/pool_controller.go:291\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:118\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:314\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/go/pkg/mod/sigs.k8s.io/controller-runtime@v0.15.3/pkg/internal/controller/controller.go:226"}
Thanks for the logs, helpful. It seems like the packets from bpfout.cali do not make it to any other device. Perhaps worth verifying with tcpdump. They seem to be eaten by the host network stack. They either have a wrong csum (unlikely that would not get fixed by calico-node restarting) or they get dropped by RPF (could you check the value in /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter
) which could be strict and then fixed after the restart. Or it gets dropped by iptables. Do you have a default route? I will give it some more try to reproduce it.
cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter
1
cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter
0
cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter -> 1 vs cat /proc/sys/net/ipv4/conf/bpfout.cali/rp_filter -> 0
That is the problem. Something sets it to 1(strict) and when calico-node restarts, is sets it back to 0. The something is probably your systemd which applies configuration when a new device is added. Seems like the issue is present with systemd 245+ What is your linux distro (which I should have asked a while ago)?
What is your linux distro (which I should have asked a while ago)?
# uname -r
5.14.0-452.el9.x86_64
# cat /etc/os-release
NAME="CentOS Stream"
VERSION="9"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="9"
PLATFORM_ID="platform:el9"
PRETTY_NAME="CentOS Stream 9"
ANSI_COLOR="0;31"
LOGO="fedora-logo-icon"
CPE_NAME="cpe:/o:centos:centos:9"
HOME_URL="https://centos.org/"
BUG_REPORT_URL="https://issues.redhat.com/"
REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux 9"
REDHAT_SUPPORT_PRODUCT_VERSION="CentOS Stream"
he something is probably your systemd
in sysctl setting it is set as
net.ipv4.conf.all.rp_filter=0
cat /etc/sysctl.conf
fs.inotify.max_user_instances=1048576
fs.inotify.max_user_watches=1048576
fs.inotify.max_queued_events=16384
fs.aio-max-nr=1048576
vm.max_map_count=262144
net.ipv4.ip_nonlocal_bind=1
net.ipv4.ip_forward=1
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv4.neigh.default.gc_thresh1=8192
net.ipv4.neigh.default.gc_thresh2=12228
net.ipv4.neigh.default.gc_thresh3=24456
net.core.somaxconn=65535
net.ipv4.conf.all.rp_filter=0
net.ipv4.conf.all.accept_local=1
kernel.panic=30
kernel.panic_on_oops=1
vm.overcommit_memory=2
vm.panic_on_oom=0
@a-sorokin-sdg do you still see the issue? Have you figured what is changing the rpf? Closing now, but feel free to reopen if you have any new info.
Yes, still have the issue. I have tried to catch who change it via audit without success.
host to service network not working after reboot/join after upgrade from v3.26.3 to v3.27.3 ebpfdataplane/vxlan/no kube-proxy/dsr killing calico-node pod immediately fixing the problem
Expected Behavior
host to service network working well after node reboot/join
Current Behavior
host to kube service network not working after reboot/join node unless you kill calico-node pod it start working after it
Possible Solution
kill calico-node pod on restarted or joined node
Steps to Reproduce (for bugs)
Context
Any pods with a host network would fail to start after rebooting/joining node Cluster network works fine
Your Environment
calico-node log after new node join calico-node install-cni log after new node join calico-node log after killling pod ccalico-node install-cni log after killling pod