Closed blampe closed 2 months ago
One of my nodes is emitting a warning about incompatible nft rules and then panic-looping while failing to log something.
Not quite, Felix trying to get the iptables rules from your system using iptables-nft-save, but that command is failing. After retrying, felix gives up and falls on its sword in an attempt to recover. See https://github.com/projectcalico/calico/blob/master/felix/iptables/table.go#L750.
So the question is, what are the "incompatible entries" in the filter table that iptables-nft-save doesn't like? Has felix chosen to use nft-tables on your system incorrectly? And what is creating the incompatible entries?
Can you get the dump of iptables rules from that system and add them here please?
What is the version of iptables-nft-save
on your system? Have you installed any rules manually?
We are Also facing the Same Issue in the below environment.
Operating System : Debian Bookworm 12.1
Kernel version : 6.1.0-12-amd64
Canal Version : v3.24.5
Canal Iptables version : v1.8.6 (nf_tables)
RKE2 Version : v1.26.0+rke2r2
Debian Bookworm Host Iptables Version : 1.8.9
Please let us know if there is any resolution.
On observation that we have seen is if we flush the IP tables we are not seeing this issue.
The nft list ruleset command output looks like the below.
Calico uses iptables 1.8.4 and it may lead to incompatibility with the other versions of iptables in the system. It needs some investigation. Could you build a calico-node image with 1.8.9 and test it out perhaps?
https://github.com/projectcalico/calico/blob/release-v3.24/node/Dockerfile.amd64#L16C18-L16C26
We have a k8s cluster running on RHEL 9.2, using nftables, and canal (image calico 3.26.1/flanel 0.21.4). The canal daemon-set attempts to have 2 ready for each worker node and you will see that it can only have 1 of 2 running. As an FYI, iptables is depricated in RHEL 9 and Canal and firewalld don't play well.
The following in my ruleset for nft is the issue regardless of syntax. Runs great without it.
add rule ip filter INPUT ct state vmap { established : accept, related : accept, invalid : drop }
or
add rule ip filter INPUT ct state vmap { established | related : accept, invalid : drop }
The rule causes the exact behavior described above.
Best regards
I will have to update this. Over the weekend without the rule 3 nodes turned "Not Ready", thus this doesn't appear to be a specific rule. It also must be randon as there 5 other nodes working just fine.
My Environment Ubuntu 22.04 LTS (GNU/Linux 5.15.0-56-generic x86_64) v1.28.3+k3s1 in docker (host network) one node Strangely, it seems that the network can still be accessed Sometimes it will display success, and then after a few seconds, it will return to 0/1
2023-11-02 02:21:45.846 [ERROR][118812] felix/table.go 857: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter"
2023-11-02 02:21:45.846 [WARNING][118812] felix/table.go 806: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table
2023-11-02 02:21:45.847 [WARNING][118812] felix/table.go 765: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
2023-11-02 02:21:45.847 [PANIC][118812] felix/table.go 771: iptables-nft-save command failed after retries ipVersion=0x4 table="filter"
panic: (*logrus.Entry) 0xc0001dc310
goroutine 195 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc00007b730, 0x0, {0xc00074c6c0, 0x2e})
/go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:260 +0x491
github.com/sirupsen/logrus.(*Entry).Log(0xc00007b730, 0x0, {0xc00098aa08?, 0x1?, 0x1?})
/go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:304 +0x48
github.com/sirupsen/logrus.(*Entry).Logf(0xc00007b730, 0x0, {0x2f2210f?, 0x6?}, {0xc00098aad0?, 0xc0005b4a80?, 0xc0001250d0?})
/go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:349 +0x7c
github.com/sirupsen/logrus.(*Entry).Panicf(...)
/go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:387
github.com/projectcalico/calico/felix/iptables.(*Table).getHashesAndRulesFromDataplane(0xc0007c7200)
/go/src/github.com/projectcalico/calico/felix/iptables/table.go:771 +0x3db
github.com/projectcalico/calico/felix/iptables.(*Table).loadDataplaneState(0xc0007c7200)
/go/src/github.com/projectcalico/calico/felix/iptables/table.go:608 +0x192
github.com/projectcalico/calico/felix/iptables.(*Table).Apply(0xc0007c7200)
/go/src/github.com/projectcalico/calico/felix/iptables/table.go:992 +0x392
github.com/projectcalico/calico/felix/dataplane/linux.(*InternalDataplane).apply.func4(0xc0004f8960?)
/go/src/github.com/projectcalico/calico/felix/dataplane/linux/int_dataplane.go:2120 +0x4c
created by github.com/projectcalico/calico/felix/dataplane/linux.(*InternalDataplane).apply in goroutine 105
/go/src/github.com/projectcalico/calico/felix/dataplane/linux/int_dataplane.go:2119 +0x12e6
W1102 02:21:45.914422 118901 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
# in calico-node pod
[root@VM-24-12-ubuntu /]# calico-node -felix-ready
W1102 02:27:14.887857 140384 feature_gate.go:241] Setting GA feature gate ServiceInternalTrafficPolicy=true. It will be removed in a future release.
calico/node is not ready: felix is not ready: readiness probe reporting 503
solved: I apt upgrade and reboot then running........maybe I only need reboot system?
We are having the same issue after a cluster upgrade.
NAME="Flatcar Container Linux by Kinvolk"
ID=flatcar
ID_LIKE=coreos
VERSION=3760.1.1
VERSION_ID=3760.1.1
BUILD_ID=2023-12-11-2212
SYSEXT_LEVEL=1.0
PRETTY_NAME="Flatcar Container Linux by Kinvolk 3760.1.1 (Oklo)"
ANSI_COLOR="38;5;75"
HOME_URL="https://flatcar.org/"
BUG_REPORT_URL="https://issues.flatcar.org"
FLATCAR_BOARD="amd64-usr"
CPE_NAME="cpe:2.3:o:flatcar-linux:flatcar_linux:3760.1.1:*:*:*:*:*:*:*"
iptables -V
iptables v1.8.8 (nf_tables)
iptables
rules on each nodecat /var/lib/iptables/rules-save
*filter
-F INPUT
-P INPUT DROP
-A INPUT -i lo -j ACCEPT
-A OUTPUT -o lo -j ACCEPT
-A INPUT -i br0 -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -s 224.0.0.0/4 -j DROP
-A INPUT -s 240.0.0.0/5 -j DROP
-A INPUT -s 255.255.255.255 -j DROP
-A INPUT -d 0.0.0.0 -j DROP
-A INPUT -s 0.0.0.0/8 -j DROP
-A INPUT -s 169.254.0.0/16 -j DROP
-A INPUT -s 192.0.2.0/24 -j DROP
-A INPUT -s 224.0.0.0/3 -j DROP
-A INPUT -s 10.0.0.0/8 -j ACCEPT
-A INPUT -s 172.16.0.0/12 -j ACCEPT
-A INPUT -s 192.168.0.0/16 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 22 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 80 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 443 -j ACCEPT
-A INPUT -i br0 -p tcp --dport 30001:32767 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 0 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 3 -j ACCEPT
-A INPUT -i br0 -p icmp --icmp-type 11 -j ACCEPT
-I INPUT ! -s 10.0.0.0/8 -p tcp --dport 22 -i br0 -m state --state NEW -m recent --set
-I INPUT -p tcp --dport 22 -i br0 -m state --state NEW -m recent --update --seconds 60 --hitcount 2 -j REJECT
-A FORWARD -p tcp --syn -m limit --limit 1/s -j ACCEPT
-A FORWARD -p tcp --tcp-flags SYN,ACK,FIN,RST RST -m limit --limit 1/s -j ACCEPT
-A FORWARD -p icmp --icmp-type echo-request -m limit --limit 1/s -j ACCEPT
Installed with Kubespray from master
branch of commit aea150e
. https://github.com/kubernetes-sigs/kubespray/commit/aea150e5dc244e933c6d5e2aee35ffb7ffe614a9
This installs Calico with settings:
---
# see roles/network_plugin/calico/defaults/main.yml
# the default value of name
# @note By default, it should be "k8s-pod-network",
# however, ours is `cni0`.
# @see `cat /etc/cni/net.d/calico.conflist.template`
# @see [https://github.com/kubernetes-sigs/kubespray/issues/8810]
calico_cni_name: cni0
## With calico it is possible to distributed routes with border routers of the datacenter.
## Warning : enabling router peering will disable calico's default behavior ('node mesh').
## The subnets of each nodes will be distributed by the datacenter router
# peer_with_router: false
# Enables Internet connectivity from containers
# nat_outgoing: true
# Enables Calico CNI "host-local" IPAM plugin
# calico_ipam_host_local: true
# add default ippool name
# calico_pool_name: "default-pool"
# add default ippool blockSize (defaults kube_network_node_prefix)
calico_pool_blocksize: 24
# add default ippool CIDR (must be inside kube_pods_subnet, defaults to kube_pods_subnet otherwise)
# calico_pool_cidr: 1.2.3.4/5
# add default ippool CIDR to CNI config
# calico_cni_pool: true
# Add default IPV6 IPPool CIDR. Must be inside kube_pods_subnet_ipv6. Defaults to kube_pods_subnet_ipv6 if not set.
# calico_pool_cidr_ipv6: fd85:ee78:d8a6:8607::1:0000/112
# Add default IPV6 IPPool CIDR to CNI config
# calico_cni_pool_ipv6: true
# Global as_num (/calico/bgp/v1/global/as_num)
# global_as_num: "64512"
# If doing peering with node-assigned asn where the globas does not match your nodes, you want this
# to be true. All other cases, false.
# calico_no_global_as_num: false
# You can set MTU value here. If left undefined or empty, it will
# not be specified in calico CNI config, so Calico will use built-in
# defaults. The value should be a number, not a string.
# calico_mtu: 1500
# Configure the MTU to use for workload interfaces and tunnels.
# - If Wireguard is enabled, subtract 60 from your network MTU (i.e 1500-60=1440)
# - Otherwise, if VXLAN or BPF mode is enabled, subtract 50 from your network MTU (i.e. 1500-50=1450)
# - Otherwise, if IPIP is enabled, subtract 20 from your network MTU (i.e. 1500-20=1480)
# - Otherwise, if not using any encapsulation, set to your network MTU (i.e. 1500)
# calico_veth_mtu: 1440
# Advertise Cluster IPs
# calico_advertise_cluster_ips: true
# Advertise Service External IPs
# calico_advertise_service_external_ips:
# - x.x.x.x/24
# - y.y.y.y/32
# Advertise Service LoadBalancer IPs
# calico_advertise_service_loadbalancer_ips:
# - x.x.x.x/24
# - y.y.y.y/16
# Choose data store type for calico: "etcd" or "kdd" (kubernetes datastore)
# @see [https://github.com/kubernetes-sigs/kubespray/issues/8917#issuecomment-1200224234]
calico_datastore: "etcd"
# Choose Calico iptables backend: "Legacy", "Auto" or "NFT"
# calico_iptables_backend: "Auto"
# Use typha (only with kdd)
# typha_enabled: false
# Generate TLS certs for secure typha<->calico-node communication
# typha_secure: false
# Scaling typha: 1 replica per 100 nodes is adequate
# Number of typha replicas
# typha_replicas: 1
# Set max typha connections
# typha_max_connections_lower_limit: 300
# Set calico network backend: "bird", "vxlan" or "none"
# bird enable BGP routing, required for ipip and no encapsulation modes
# @note We stay here for better compatibility. This shall be upgraded later.
calico_network_backend: bird
# IP in IP and VXLAN is mutualy exclusive modes.
# set IP in IP encapsulation mode: "Always", "CrossSubnet", "Never"
# @note We stay here for better compatibility. This shall be upgraded later.
calico_ipip_mode: 'Always'
# set VXLAN encapsulation mode: "Always", "CrossSubnet", "Never"
# @note We stay here for better compatibility. This shall be upgraded later.
calico_vxlan_mode: 'Never'
# set VXLAN port and VNI
# calico_vxlan_vni: 4096
# calico_vxlan_port: 4789
# Enable eBPF mode
# calico_bpf_enabled: false
# If you want to use non default IP_AUTODETECTION_METHOD, IP6_AUTODETECTION_METHOD for calico node set this option to one of:
# * can-reach=DESTINATION
# * interface=INTERFACE-REGEX
# see https://docs.projectcalico.org/reference/node/configuration
# calico_ip_auto_method: "interface=eth.*"
# calico_ip6_auto_method: "interface=eth.*"
# Set FELIX_MTUIFACEPATTERN, Pattern used to discover the host’s interface for MTU auto-detection.
# see https://projectcalico.docs.tigera.io/reference/felix/configuration
# calico_felix_mtu_iface_pattern: "^((en|wl|ww|sl|ib)[opsx].*|(eth|wlan|wwan).*)"
# Choose the iptables insert mode for Calico: "Insert" or "Append".
# calico_felix_chaininsertmode: Insert
# If you want use the default route interface when you use multiple interface with dynamique route (iproute2)
# see https://docs.projectcalico.org/reference/node/configuration : FELIX_DEVICEROUTESOURCEADDRESS
# calico_use_default_route_src_ipaddr: false
# Enable calico traffic encryption with wireguard
# calico_wireguard_enabled: false
# Under certain situations liveness and readiness probes may need tunning
# calico_node_livenessprobe_timeout: 10
# calico_node_readinessprobe_timeout: 10
# Calico apiserver (only with kdd)
# calico_apiserver_enabled: false
Calico v3.26.4
.
2024-01-01 15:52:18.217 [ERROR][628323] felix/table.go 857: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter"
2024-01-01 15:52:18.217 [WARNING][628323] felix/table.go 806: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table
2024-01-01 15:52:18.217 [WARNING][628323] felix/table.go 765: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
2024-01-01 15:52:18.321 [ERROR][628323] felix/table.go 857: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter"
2024-01-01 15:52:18.321 [WARNING][628323] felix/table.go 806: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table
2024-01-01 15:52:18.322 [WARNING][628323] felix/table.go 765: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
2024-01-01 15:52:18.525 [ERROR][628323] felix/table.go 857: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter"
2024-01-01 15:52:18.525 [WARNING][628323] felix/table.go 806: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table
2024-01-01 15:52:18.525 [WARNING][628323] felix/table.go 765: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
2024-01-01 15:52:18.928 [ERROR][628323] felix/table.go 857: iptables-save failed because there are incompatible nft rules in the table. Remove the nft rules to continue. ipVersion=0x4 table="filter"
2024-01-01 15:52:18.929 [WARNING][628323] felix/table.go 806: Killing iptables-nft-save process after a failure error=iptables-save failed because there are incompatible nft rules in the table
2024-01-01 15:52:18.929 [WARNING][628323] felix/table.go 765: iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
2024-01-01 15:52:18.929 [PANIC][628323] felix/table.go 771: iptables-nft-save command failed after retries ipVersion=0x4 table="filter"
panic: (*logrus.Entry) 0xc00046ae70
goroutine 314 [running]:
github.com/sirupsen/logrus.(*Entry).log(0xc00036fe30, 0x0, {0xc00113ac60, 0x2e})
/go/pkg/mod/github.com/sirupsen/logrus@v1.9.0/entry.go:260 +0x491
However, the cluster itself seems to be operational, but we are worried.
We are having the same issue after a cluster upgrade.
...
However, the cluster itself seems to be operational, but we are worried.
I tried the following since I posted the issue.
/var/lib/iptables/rules-save
that I posted above. Then ran systemctl restart iptables-restore.service
and then restarted the Calico node Pods. I attempted about 5 changes. Did not help.FELIX_CHAININSERTMODE
from Insert
to Append
. Did not help.FELIX_IPTABLESBACKEND
from Auto
to Legacy
. This did help.The nodes now look stable.
When the issue persisted, I observed that:
I updated the k8s-net-calico.yml
in Kubespray inventory variables so that future upgrades to the cluster will reflect the changes:
# Choose Calico iptables backend: "Legacy", "Auto" or "NFT"
# This may be set back to `Auto` once the underlying issue is fixed/found.
# @see [https://github.com/projectcalico/calico/issues/8025]
calico_iptables_backend: "Legacy"
I've lost several days iterating through this problem, and it's unsolveable without a complete rewrite of the NFT support in Calico.
The underlying problem here is that instead of adding real nftables support, it was added by using the iptables emulation layer. If any other subsystem on the node makes use of nft features incompatible with iptables, calico-node breaks entirely and ceases to work.
iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
At this time the current versions of all of the following make nft-specific changes to the rules which will cause calico to break:
There is no solution, so people are being forced to switch away from Calico to restore their kubernetes cluster networking
I've lost several days iterating through this problem, and it's unsolveable without a complete rewrite of the NFT support in Calico.
The underlying problem here is that instead of adding real nftables support, it was added by using the iptables emulation layer. If any other subsystem on the node makes use of nft features incompatible with iptables, calico-node breaks entirely and ceases to work.
iptables-nft-save command failed error=iptables-save failed because there are incompatible nft rules in the table ipVersion=0x4 stderr="" table="filter"
At this time the current versions of all of the following make nft-specific changes to the rules which will cause calico to break:
- docker
- CRI
- kubernetes (specifically kube-proxy)
There is no solution, so people are being forced to switch away from Calico to restore their kubernetes cluster networking
Set FELIX_IPTABLESBACKEND
from Auto
to Legacy
and you are fixed.
Apologies to all users on this thread that our documentation failed to provide the solution using FELIX_IPTABLESBACKEND. Doc/Ops team is looking at the best places (probably several) to ensure no one has to struggle with this again.
Set
FELIX_IPTABLESBACKEND
fromAuto
toLegacy
and you are fixed.
That would be a very odd definition of "fixed" -- you must be referring to usage which means castrated? 😉
If my kernel is using nftables then even if I could run iptables and nftables side by side, why in the world would I want to have that confusion? And most modern distro releases don't even have the legacy option available any more.
The move to nftables is approaching a decade old. Calico needs to update away from classic iptables before there's no kernels left that support it.
Apologies to all users on this thread that our documentation failed to provide the solution using FELIX_IPTABLESBACKEND.
The documentation makes clear how AUTO and NFT options work. This isn't the problem. The problem is that your NFT support is still using iptables commands. It's not really NFT support, it's a passthrough to an emulator that tries to present the nftables in iptables output. Which fails with even simple native nft tables.
When set to NFT, you should be using nft
commands, not iptables
commands.
You raise good points that are being reviewed. I agree "fixed" was not the best choice of words here. As a writer, I can only help avoid churn, frustration, and time lost troubleshooting for other users until a proper solution is in place.
Oh, I took no offense to your use of the word. As a writer myself, I tried to play with the word to make it clear I was laughing so that I didn't come off too intensely critical.
Yes, the situation is complex, especially when supporting multiple generations of kernels in heterogeneous environments, and I know it's been tricky for projects to find the right balance of embracing nft while continuing to support iptables. I'm just trying to push that doing the investment in pure nftables support is necessary at this point, now that other projects have made that investment and the tables are no longer backwards compatible with iptables.
I apologize for my confusion. Could some of you elaborate on some of the deep technical points raised here?
That would be a very odd definition of "fixed" -- you must be referring to usage which means castrated? 😉
FELIX_IPTABLESBACKEND
to Legacy
is castration
? Did I miss something here?If my kernel is using nftables then even if I could run iptables and nftables side by side, why in the world would I want to have that confusion?
And most modern distro releases don't even have the legacy option available any more.
This would significantly improve my understanding so I can be on the same level as some of you. I appreciate any help you can provide.
I apologize for my confusion. Could some of you elaborate on some of the deep technical points raised here?
That would be a very odd definition of "fixed" -- you must be referring to usage which means castrated? 😉
- How does setting
FELIX_IPTABLESBACKEND
toLegacy
iscastration
? Did I miss something here?If my kernel is using nftables then even if I could run iptables and nftables side by side, why in the world would I want to have that confusion?
- How do you define confusion here? Is this a metaphor? Could you give examples?
Under the hood you would run one, but yes, it may lead to some incompatibilities (perhaps referred here as confusion).
And most modern distro releases don't even have the legacy option available any more.
- What does "modern" mean here?
This would significantly improve my understanding so I can be on the same level as some of you. I appreciate any help you can provide.
Moder means that newer versions do not come with compatibility packages between iptables and nftables.
As @bmckercher123 said, we are looking into this issue and we will address it one way or another. Thanks for reporting the issue and apologies for the current troubles.
Sorry @tomastigera @zzvara, setting the mode to legacy is not a solution to this problem. The current best answer is to use iptables-nft for all your components until we get a proper nftables backend in place. Using a mix of legacy iptables and nftables doesn't fail (assuming your kernel supports both) but the behaviour is very counter-intuitive. nftables can "undo" the verdict made by iptables-legacy so your policy may not get properly enforced and the failures will be confusing.
I understand the desire to jump to "proper" nftables mode ASAP but please bear in mind that kubernetes nftables mode is in Alpha in v1.29. It's not ready for prod use either.
We've been relying on the itpables-nft translation layer for a long time, which has meant that we're in sync with kube-proxy. If we moved to native nftables before kube-proxy then we'd have caused the same problem for kube-proxy!
Clearly, now that kube-proxy has nftables support, we also need to add it ASAP in order to remain in sync. I for one didn't spot that nftables support was on the slate for v1.29.
The current best answer is to use iptables-nft
Which does not work, as the issue reports here and as I and other have reported. iptables-nft fails when anything that iptables cannot express is in the nft tables, and every other project involved in kubernetes is now adding rules that are incompatible.
Using a mix of legacy iptables and nftables doesn't fail [...] so your policy may not get properly enforced and the failures will be confusing.
Sounds like the definition of failure to me. It works only in very limited circumstances and debugging it is as confusing as hell.
I understand the desire to jump to "proper" nftables mode ASAP but please bear in mind that kubernetes nftables mode is in Alpha in v1.29. It's not ready for prod use either.
We've been relying on the itpables-nft translation layer for a long time, which has meant that we're in sync with kube-proxy. If we moved to native nftables before kube-proxy then we'd have caused the same problem for kube-proxy!
kube-proxy 1.23+ is what is creating nft tables that iptables-nft can't parse, and causing calico to fail.
Clearly, now that kube-proxy has nftables support, we also need to add it ASAP in order to remain in sync.
Yes, this is the core problem.
Which does not work, as the issue reports here and as I and other have reported. iptables-nft fails when anything that iptables cannot express is in the nft tables, and every other project involved in kubernetes is now adding rules that are incompatible.
Yes, there are two issues here:
nftables
. That will take more work.kube-proxy 1.23+ is what is creating nft tables that iptables-nft can't parse, and causing calico to fail.
Hopefully, this falls under needing to bump the iptables version to support the latest version of the compatibility shim so hopefully we can get a fix for that out soon.
Unfortunately, that fix won't make kube-proxy nftables
mode work.
We need to bump iptables version because the iptables-nft shim that kube-proxy etc is using has been updated.
https://github.com/projectcalico/calico/pull/8416 updates the version of the compatibility layer we include in Calico, and so should solve this first bullet point and make Calico compatible with kube-proxy when both are running in iptables-nft compatibility mode.
As @fasaxc suggested above, in order to support compatibility with other users of nftables we will likely need to stop depending on the itpables-nft compatibility layer. I'll be looking into this.
Hi team, is there any workaround for this issue?
@luniHw Yes, the workaround is to not use kube-proxy in nftables mode with Calico!
Quick update - just merged a Calico nftables dataplane implementation compatible with nftables kube-proxy here: https://github.com/projectcalico/calico/pull/8780
tech-preview support is currently scheduled for Calico v3.29.0, and a GA release will come sometime after that once we determine it to be sufficiently stable.
I'm going to close this for now. With the nftables dataplane mentioned in my previous comment arriving in Calico v3.29, you should be able to run Calico with the (now beta) nftables kube-proxy mode.
Any further compatibility issues between the two should be handled in distinct issues. Thanks all.
Expected Behavior
One of my nodes is emitting a warning about incompatible nft rules and then panic-looping while failing to log something.
Current Behavior
Possible Solution
The error doesn't suggest how to remove the incompatible rules. I've tried
nft flush ruleset
but the problem consistently comes back.Steps to Reproduce (for bugs)
1. 2. 3. 4.
Context
Your Environment