Open lowjoel opened 1 year ago
Edit 3: I can run multiple copies of conntrack -E
until I start filtering on 239.255.255.250 UDP (which is the SSDP address). Running that seems to trigger the panic.
Does it happen if you remove iptables modules and stay in nftables-only world?
Unfortunately I can't remove iptables because that router uses mwan3.
I should however add that the SSDP address had special routing added by smcroute to cross vlans. The crash can be triggered when an SSDP discovery packet is sent by any device.
Edit 4: conntrack -E
with --orig-dst 239.255.255.0
without smcroute seems to not cause the kernel to crash.
cc @mwarning ☝️ I'm not sure if you're able to put two and two together in this instance. There's probably some funky interaction with the newer patches in 5.10. Not sure if 5.15 is affected.
mwan3 works with iptables-nft. please check installing iptables-nft before mwan3
One bug is almost-null pointer dereference. Other if you confirmed is that iptables dependency unnecesarily pulls iptables-legacy.
I will try to remove all the iptables modules. I got iptables-nft in, but I still see the old iptable modules installed, need to dig why. I'll do that then return with whether I can still get it to panic.
I cannot get it to crash - running 5 copies of conntrack -E and miniupnpd while windows client exploring proximity devices via multicast. Probably some detail is missing in general picture.
The legacy warning is not harmful as long as nothing is programmed in legacy iptables, e.g. their frontends not installed at all. I managed to get rid of legacy warning here: https://github.com/openwrt/openwrt/issues/10988 (x_tables is loaded by iptables-nft when first used alone it does not trigger warning later)
did you also run smcroute? my simplified smcroute.conf:
phyint br-lan.vlan1 enable ttl-threshold 2
phyint br-lan.vlan2 enable ttl-threshold 2
mgroup from br-lan.vlan1 group 239.255.255.250
mroute from br-lan.vlan1 group 239.255.255.250 to br-lan.vlan2
(basically I'm trying to get my devices in a vlan2 to also reply to the SSDP solicitation from vlan1)
It seems I needed smcroute
, conntrack -E
, and the SSDP packet to cause the panic. Two of the 3 will work.
I have set up pretend-repeater minus mwan3 and waiting for crash seems sometimes multiple identical multicast states appear though I dont have base view on how it should be.
I base my claim that mwan3 setup is unstable on Ubuntu/Debian asking reboot after switching between iptables-? worlds to keep from mixing modules, them referring unpredictable behaviour upstream without exact pointer.
mwan3 needs only xt_ipset besides ip6tables-nft and iptables-nft, certainly not ip_tables kmod that triggers legacy warning in -nft tools. Basically you need to
iptables/ip6tables are nft versions:
# iptables -V
iptables v1.8.7 (nf_tables)
# ip6tables -V
ip6tables v1.8.7 (nf_tables)
I tried to remove the ip_tables and ip6_tables modules:
# for i in iptable_raw iptable_mangle iptable_filter ip_tables ip6table_mangle ip6table_filter ip6_tables; do rmmod $i; done
unloading the module failed
# lsmod | grep -E 'ip_|ip6_'
ip_set 32768 17 xt_set,ip_set_list_set,ip_set_hash_netportnet,ip_set_hash_netport,ip_set_hash_netnet,ip_set_hash_netiface,ip_set_hash_net,ip_set_hash_mac,ip_set_hash_ipportnet,ip_set_hash_ipportip,ip_set_hash_ipport,ip_set_hash_ipmark,ip_set_hash_ipmac,ip_set_hash_ip,ip_set_bitmap_port,ip_set_bitmap_ipmac,ip_set_bitmap_ip
ip_set_bitmap_ip 16384 0
ip_set_bitmap_ipmac 16384 0
ip_set_bitmap_port 16384 0
ip_set_hash_ip 32768 0
ip_set_hash_ipmac 32768 0
ip_set_hash_ipmark 32768 0
ip_set_hash_ipport 32768 0
ip_set_hash_ipportip 32768 0
ip_set_hash_ipportnet 40960 0
ip_set_hash_mac 20480 0
ip_set_hash_net 36864 5
ip_set_hash_netiface 36864 0
ip_set_hash_netnet 40960 0
ip_set_hash_netport 36864 0
ip_set_hash_netportnet 40960 0
ip_set_list_set 16384 1
ip_tunnel 20480 1 sit
ip6_tables 28672 5
nfnetlink 12288 9 nf_conntrack_netlink,nft_compat,nf_tables,ip_set
x_tables 24576 35 xt_connlimit,xt_state,xt_helper,xt_conntrack,xt_connmark,xt_connbytes,xt_CT,ipt_REJECT,xt_time,xt_tcpudp,xt_tcpmss,xt_statistic,xt_recent,xt_policy,xt_multiport,xt_mark,xt_mac,xt_limit,xt_length,xt_hl,xt_esp,xt_ecn,xt_dscp,xt_comment,xt_TCPMSS,xt_LOG,xt_HL,xt_DSCP,xt_CLASSIFY,nft_compat,ipt_ah,ipt_ECN,xt_set,ip6_tables,ip6t_REJECT
So x_tables are still using ip6tables. Is that OK? I don't know which xt??? modules to remove though, they seem to be part of the normal nftables flow?
# lsmod | grep 'xt_'
xt_CLASSIFY 12288 0
xt_CT 12288 0
xt_DSCP 12288 0
xt_HL 12288 0
xt_LOG 12288 0
xt_TCPMSS 12288 0
xt_comment 12288 18
xt_connbytes 12288 0
xt_connlimit 12288 0
xt_connmark 12288 4
xt_conntrack 12288 0
xt_dscp 12288 0
xt_ecn 12288 0
xt_esp 12288 0
xt_helper 12288 0
xt_hl 12288 0
xt_length 12288 0
xt_limit 12288 0
xt_mac 12288 0
xt_mark 12288 65
xt_multiport 12288 0
xt_policy 12288 0
xt_recent 20480 0
xt_set 16384 15
xt_state 12288 0
xt_statistic 12288 0
xt_tcpmss 12288 0
xt_tcpudp 12288 0
xt_time 12288 0
I think I found something else, it's actually the nft command that causes the panic (and matches the stack trace):
# nft add element inet fw4 SSDP-Response { 239.255.255.250 }
Does that panic for you while running everything else?
Commands I have running (I'm going to edit while I try different combinations):
# conntrack -E --buffer-size 1048576 --proto tcp --orig-src 239.255.255.0 --orig-port-dst 1900 &
# while true; do \
nft delete element inet fw4 SSDP-Response { 239.255.255.250 } ; \
sleep 1; \
nft add element inet fw4 SSDP-Response { 239.255.255.250 } ; \
sleep 1; \
done
EDIT 5: It seems like smcroute and conntrack don't have to be running. Once there's an existing SSDP session, the kernel panics when the ipset has the SSDP address in a set. I wonder if I need to have firewall rules referencing that ipset?
/etc/config/firewall:
config ipset
option name SSDP-Response
option match dest_net
option timeout 60
config rule
option target ACCEPT
option src vlan2
option src_ip xxx.yyy.zzz.aaa
option dest vlan1
option ipset SSDP-Response
option proto udp
Should not crash though.... Does not for me...
Textual interpretation of ruleset: When multicast request is sent to (other) network that network is added to timed ipset which is used to permit unicast responses back.
Just thinking aloud: I see in ruleset that ipset is with timeout? Maybe something to do with removal on timeout and (absent) locking? Just thinking aloud. The problem is to get debug version next to the issue - VM has space for debuginfo but too much speed to hit (?locking) issue.
I think ton of xt_ modules is just cosmetic defect. Only one around ipsets does something here.
Possibly, the best would be for me to get a debug version of nf_tables.ko but I can't figure out how to build one and replace it on the router.
Before that, I can try running this without the timeout and see if I can replicate. I did a diff between kernels 5.10.161 and 5.10.176 and the only major change is a bugfix in the rbtree implementation. But I don't have enough context to know how this is related (or not)
That would indeed isolate problem to changing ipset or not.
So disabling the timeout seems to give me much better uptime. It's been adding and removing elements to the SSDP-Response set for about 24h now without problems. With a timeout, it'll panic in less than an hour. So we're definitely onto something.
In your repeatable setup having expiring ipset triggers the elusive problem.
I thinking how to stress expiring ipset with updates so that ipset update or access or scheduled expiration meet as often as possible to have generic kernel problem repeater.
I just compiled nf_tables.ko from source, and it seems to be different from the one that is provided in the debug tarball. I can't test it right now, but I'll update here if I can and get a nice stack trace out of it.
I got a build of nf_tables.ko with debug info but it seems that the normal boot image that is distributed on every release doesn't seem to have CONFIG_KALLSYMS enabled, which means that the debug info isn't being used when printing the stack trace. I'm trying out whether I can boot a custom kernel with CONFIG_KALLSYMS (using kexec) just to trigger the crash, then go back the normal kernel (as a failsafe)
You dont have to debug in place, somebody better versed in nftables internals can decode function name from (stripped) file offsets and the code pattern in that place in file. You have found a workaround, if you get bored later you may flip the timeout with later version and see if crashes repeat.
Maintainer: @jow Environment: OpenWrt 22.03.4+, Linksys E8450 (aarch64)
Description: Hello there, I have
conntrack -E
running on my router to have some kind of firewall-like behaviour for a protocol that's not natively supported. Ever since upgrading to OpenWrt 22.03.4 (and also OpenWrt 22.03.5), I've been getting kernel panics from nf_tables:On 22.03.4 (https://forum.openwrt.org/t/belkin-rt3200-linksys-e8450-wifi-ax-discussion/94302/3583?u=lowjoel)
On 22.03.5
Specifically,
conntrack -E --proto udp --orig-dst 239.255.255.0
together with smcroute for that broadcast (to SSDP across vlan segments), coupled with an SSDP discovery packet seems to be triggering the crash.Edit history:
I believe it's
conntrack -E
because once I stop the process (running as a service), my router no longer panics (has been up for about 5+hours; withconntrack -E
running, I've had 3 panics in a span of about 30 mins). Any suggestions/ideas welcome.Edit 1: After starting
conntrack -E
, it panicked within 3 minutes. I'm now trying with only one instance ofconntrack -E
to see if it makes a difference (they are all looking at different ports/IP address ranges) Edit 2: I tried using the debug kernel package on the device download page and tried to extract nf_tables.ko (which is significantly bigger). I can't insmod/modprobe it though? How do I get the right debug information?