Open Omoeba opened 3 years ago
The issue persists in kernel 5.10.31-v8+
Try giving some information about your network architecture - this isn't affecting many people or GitHub would be flooded, so what might make your environment different?
And what is host 84:8a:8d:32:c4:19, that is sending IPV6 all-nodes multicasts? It appears to be a Cisco device.
The online decoder I found when last looking at this says we have an ICMPv6 Router Advertisement. https://hpd.gasmi.net/?data=6000000000203AFFFE80000000000000868A8DFFFE32C419FF02000000000000000000000000000186008D1540C007080036EE80000000000101848A8D32C41905010000000005DC&force=ipv6
And a couple of ICMPv6 Multicast Listener Query (130) https://hpd.gasmi.net/?data=6000000000240001FE80000000000000868A8DFFFE32C419FF0200000000000000000000000000013A0005020000010082007FC02710000000000000000000000000000000000000027D0000&force=ipv6
No specific reason why these messages would flag invalid checksums.
It does look like there's something odd about your network though with
IPv4: martian source 192.168.1.69 from 169.254.124.175, on dev eth1
being logged several times in there. Are you bridging between eth0 and eth1 or similar?
84:8a:8d:32:c4:19
doesn't appear to be connected to my network anymore. I am indeed forwarding traffic between eth0 and eth1 with eth1 connected to the LAN and eth0 connected to the WAN. I'll try disconnecting eth1 completely and see if the issue persists.
It would be interesting to see if the problem recurs with traffic from another device, and if so what kind of traffic it is.
The problem still occurs with eth1 completely disconnected. I'll try plugging eth0 into my pc instead of a SB8200 cable modem tomorrow and see if the issue persists.
Plugging the pi directly into my pc caused the issue to disappear despite throwing every kind of traffic I can think of at it. I’ll try the 5.4 bcmgenet driver since I don’t recall the issue occurring prior to 5.10.
The 5.4 bcmgenet driver failed to compile so I unfortunately couldn't test it. However, since I don't recall the issue occurring prior to 5.10, it's likely some change between 5.4 and 5.10 that caused the issue.
See also #4184, and what resolved the issue for me. I'm not sure if it is relevant here.
I've done some additional testing and the issue began at kernel version 5.6. No log flooding occurs when rx offloading is enabled on versions 5.4 and 5.5 while 5.6 immediately started flooding syslog with csum failure
messages. I still have no idea which particular commit caused the issue but it's narrowed down significantly.
what makes my environment different
it does seem to be somewhat environmental... another user of my build has reported no errors... I do recall on 5.4 tcpdump indicating cksum issues on the same icmpv6 traffic... so it seems that in the past... these things were non critical and handled/passed by/to upper layers...
may have seen 'ifb' in the OP's crashdump... i'm also using an ifb via cake/sqm... perhaps a common denominator here? ( ipv6 + ifb )
some logs
[root@dca632 /usbstick 45°]# dmesg | grep -A15 Hardware
[ 243.210946] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 243.216798] Workqueue: mmc_complete mmc_blk_mq_complete_work
[ 243.222456] Call trace:
[ 243.224902] dump_backtrace+0x0/0x16c
[ 243.228562] show_stack+0x18/0x30
[ 243.231876] dump_stack+0xd4/0x110
[ 243.235273] netdev_rx_csum_fault.part.0+0x48/0x58
[ 243.240062] netdev_rx_csum_fault+0x3c/0x40
[ 243.244241] __skb_checksum_complete+0xdc/0xe0
[ 243.248684] icmpv6_rcv+0xec/0x570
[ 243.252082] ip6_protocol_deliver_rcu+0xe8/0x504
[ 243.256694] ip6_input+0x98/0xbc
[ 243.259917] ipv6_rcv+0xf0/0x130
[ 243.263140] __netif_receive_skb_one_core+0x48/0x60
[ 243.268013] netif_receive_skb+0x58/0x11c
[ 243.272019] br_pass_frame_up+0x134/0x190
--
[ 268.298922] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 268.304750] Call trace:
[ 268.307206] dump_backtrace+0x0/0x16c
[ 268.310867] show_stack+0x18/0x30
[ 268.314182] dump_stack+0xd4/0x110
[ 268.317580] netdev_rx_csum_fault.part.0+0x48/0x58
[ 268.322371] netdev_rx_csum_fault+0x3c/0x40
[ 268.326552] __skb_checksum_complete+0xdc/0xe0
[ 268.330995] icmpv6_rcv+0xec/0x570
[ 268.334394] ip6_protocol_deliver_rcu+0xe8/0x504
[ 268.339007] ip6_input+0x98/0xbc
[ 268.342230] ipv6_rcv+0xf0/0x130
[ 268.345454] __netif_receive_skb_one_core+0x48/0x60
[ 268.350327] netif_receive_skb+0x58/0x11c
[ 268.354334] br_pass_frame_up+0x134/0x190
[ 268.358338] br_handle_frame_finish+0x2e0/0x460
--
[ 269.526687] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 269.532509] Call trace:
[ 269.534954] dump_backtrace+0x0/0x16c
[ 269.538606] show_stack+0x18/0x30
[ 269.541913] dump_stack+0xd4/0x110
[ 269.545304] netdev_rx_csum_fault.part.0+0x48/0x58
[ 269.550086] netdev_rx_csum_fault+0x3c/0x40
[ 269.554260] __skb_checksum_complete+0xdc/0xe0
[ 269.558695] icmpv6_rcv+0xec/0x570
[ 269.562086] ip6_protocol_deliver_rcu+0xe8/0x504
[ 269.566692] ip6_input+0x98/0xbc
[ 269.569909] ipv6_rcv+0xf0/0x130
[ 269.573127] __netif_receive_skb_one_core+0x48/0x60
[ 269.577994] netif_receive_skb+0x58/0x11c
[ 269.581994] br_pass_frame_up+0x134/0x190
[ 269.585993] br_handle_frame_finish+0x2e0/0x460
--
[ 275.819060] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 275.824888] Call trace:
[ 275.827344] dump_backtrace+0x0/0x16c
[ 275.831006] show_stack+0x18/0x30
[ 275.834322] dump_stack+0xd4/0x110
[ 275.837720] netdev_rx_csum_fault.part.0+0x48/0x58
[ 275.842510] netdev_rx_csum_fault+0x3c/0x40
[ 275.846691] __skb_checksum_complete+0xdc/0xe0
[ 275.851134] icmpv6_rcv+0xec/0x570
[ 275.854533] ip6_protocol_deliver_rcu+0xe8/0x504
[ 275.859145] ip6_input+0x98/0xbc
[ 275.862368] ipv6_rcv+0xf0/0x130
[ 275.865592] __netif_receive_skb_one_core+0x48/0x60
[ 275.870464] netif_receive_skb+0x58/0x11c
[ 275.874470] br_pass_frame_up+0x134/0x190
[ 275.878475] br_handle_frame_finish+0x2e0/0x460
--
[ 304.446727] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 304.452550] Call trace:
[ 304.454997] dump_backtrace+0x0/0x16c
[ 304.458650] show_stack+0x18/0x30
[ 304.461958] dump_stack+0xd4/0x110
[ 304.465350] netdev_rx_csum_fault.part.0+0x48/0x58
[ 304.470133] netdev_rx_csum_fault+0x3c/0x40
[ 304.474307] __skb_checksum_complete+0xdc/0xe0
[ 304.478745] icmpv6_rcv+0xec/0x570
[ 304.482137] ip6_protocol_deliver_rcu+0xe8/0x504
[ 304.486743] ip6_input+0x98/0xbc
[ 304.489960] ipv6_rcv+0xf0/0x130
[ 304.493179] __netif_receive_skb_one_core+0x48/0x60
[ 304.498046] netif_receive_skb+0x58/0x11c
[ 304.502046] br_pass_frame_up+0x134/0x190
[ 304.506045] br_handle_frame_finish+0x2e0/0x460
--
[ 305.701695] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 305.707518] Call trace:
[ 305.709965] dump_backtrace+0x0/0x16c
[ 305.713618] show_stack+0x18/0x30
[ 305.716927] dump_stack+0xd4/0x110
[ 305.720319] netdev_rx_csum_fault.part.0+0x48/0x58
[ 305.725102] netdev_rx_csum_fault+0x3c/0x40
[ 305.729276] __skb_checksum_complete+0xdc/0xe0
[ 305.733716] icmpv6_rcv+0xec/0x570
[ 305.737109] ip6_protocol_deliver_rcu+0xe8/0x504
[ 305.741716] ip6_input+0x98/0xbc
[ 305.744934] ipv6_rcv+0xf0/0x130
[ 305.748152] __netif_receive_skb_one_core+0x48/0x60
[ 305.753020] netif_receive_skb+0x58/0x11c
[ 305.757020] br_pass_frame_up+0x134/0x190
[ 305.761019] br_handle_frame_finish+0x2e0/0x460
--
[ 306.932906] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 306.938735] Call trace:
[ 306.941190] dump_backtrace+0x0/0x16c
[ 306.944851] show_stack+0x18/0x30
[ 306.948165] dump_stack+0xd4/0x110
[ 306.951564] netdev_rx_csum_fault.part.0+0x48/0x58
[ 306.956354] netdev_rx_csum_fault+0x3c/0x40
[ 306.960536] __skb_checksum_complete+0xdc/0xe0
[ 306.964980] icmpv6_rcv+0xec/0x570
[ 306.968381] ip6_protocol_deliver_rcu+0xe8/0x504
[ 306.972993] ip6_input+0x98/0xbc
[ 306.976217] ipv6_rcv+0xf0/0x130
[ 306.979441] __netif_receive_skb_one_core+0x48/0x60
[ 306.984314] netif_receive_skb+0x58/0x11c
[ 306.988322] br_pass_frame_up+0x134/0x190
[ 306.992327] br_handle_frame_finish+0x2e0/0x460
--
[ 308.196567] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 308.202395] Call trace:
[ 308.204850] dump_backtrace+0x0/0x16c
[ 308.208509] show_stack+0x18/0x30
[ 308.211825] dump_stack+0xd4/0x110
[ 308.215223] netdev_rx_csum_fault.part.0+0x48/0x58
[ 308.220013] netdev_rx_csum_fault+0x3c/0x40
[ 308.224193] __skb_checksum_complete+0xdc/0xe0
[ 308.228636] icmpv6_rcv+0xec/0x570
[ 308.232035] ip6_protocol_deliver_rcu+0xe8/0x504
[ 308.236647] ip6_input+0x98/0xbc
[ 308.239870] ipv6_rcv+0xf0/0x130
[ 308.243093] __netif_receive_skb_one_core+0x48/0x60
[ 308.247966] netif_receive_skb+0x58/0x11c
[ 308.251972] br_pass_frame_up+0x134/0x190
[ 308.255977] br_handle_frame_finish+0x2e0/0x460
--
[ 338.919628] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 338.925450] Call trace:
[ 338.927895] dump_backtrace+0x0/0x16c
[ 338.931548] show_stack+0x18/0x30
[ 338.934855] dump_stack+0xd4/0x110
[ 338.938246] netdev_rx_csum_fault.part.0+0x48/0x58
[ 338.943029] netdev_rx_csum_fault+0x3c/0x40
[ 338.947202] __skb_checksum_complete+0xdc/0xe0
[ 338.951637] icmpv6_rcv+0xec/0x570
[ 338.955029] ip6_protocol_deliver_rcu+0xe8/0x504
[ 338.959635] ip6_input+0x98/0xbc
[ 338.962853] ipv6_rcv+0xf0/0x130
[ 338.966070] __netif_receive_skb_one_core+0x48/0x60
[ 338.970937] netif_receive_skb+0x58/0x11c
[ 338.974937] br_pass_frame_up+0x134/0x190
[ 338.978936] br_handle_frame_finish+0x2e0/0x460
--
[ 360.441933] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 360.447755] Call trace:
[ 360.450199] dump_backtrace+0x0/0x16c
[ 360.453852] show_stack+0x18/0x30
[ 360.457159] dump_stack+0xd4/0x110
[ 360.460550] netdev_rx_csum_fault.part.0+0x48/0x58
[ 360.465333] netdev_rx_csum_fault+0x3c/0x40
[ 360.469506] __skb_checksum_complete+0xdc/0xe0
[ 360.473942] icmpv6_rcv+0xec/0x570
[ 360.477334] ip6_protocol_deliver_rcu+0xe8/0x504
[ 360.481940] ip6_input+0x98/0xbc
[ 360.485157] ipv6_rcv+0xf0/0x130
[ 360.488374] __netif_receive_skb_one_core+0x48/0x60
[ 360.493241] netif_receive_skb+0x58/0x11c
[ 360.497240] br_pass_frame_up+0x134/0x190
[ 360.501239] br_handle_frame_finish+0x2e0/0x460
@Omoeba if you have an ipad or similar idevice on your network...
can you reproduce with all idevices disconnected?
Not sure if its helpful, but, I have a custom CM4 carrier board (with poorly tracked Ethernet) this causes the same problem. Moving the same CM4 module to the official carrier board fixes the problem.
My only workaround was to limit the ethernet to 100Mbit using; ethtool -s eth0 speed 100 duplex full
@Omoeba if you have an ipad or similar idevice on your network...
can you reproduce with all idevices disconnected?
I do but the issue still occurs with eth1 completely disconnected and eth0 is the internet-facing interface
ip6tables -t mangle -I INPUT -i br-lan \
-p ipv6-icmp -m icmp6 --icmpv6-type 136 \
-j DSCP --set-dscp 0x00
hope they fix this soon...
Observed this when uploading TB volume data with rclone from a RPi4 to AWS S3. The RPi is directly connected via Ethernet to a Verizon Fios Optical Network Terminal exposing 1Gbit upstream connection. Transfer happened at around 400MBit/s.
[Thu Feb 16 02:44:30 2023] bcmgenet fd580000.ethernet eth0: hw csum failure
[Thu Feb 16 02:44:30 2023] skb len=40 headroom=144 headlen=40 tailroom=1992
mac=(130,14) net=(144,20) trans=164
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x3a5f28a1 ip_summed=2 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x0800 pkttype=0 iif=2
[Thu Feb 16 02:44:30 2023] dev name=eth0 feat=0x0000010000004829
[Thu Feb 16 02:44:30 2023] skb headroom: [REDACTED]
[Thu Feb 16 02:44:30 2023] CPU: 0 PID: 5799 Comm: rclone Tainted: G C 6.1.7-v8+ #1
[Thu Feb 16 02:44:30 2023] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[Thu Feb 16 02:44:30 2023] Call trace:
[Thu Feb 16 02:44:30 2023] dump_backtrace.part.0+0xec/0x100
[Thu Feb 16 02:44:30 2023] show_stack+0x20/0x30
[Thu Feb 16 02:44:30 2023] dump_stack_lvl+0x8c/0xb8
[Thu Feb 16 02:44:30 2023] dump_stack+0x18/0x34
[Thu Feb 16 02:44:30 2023] netdev_rx_csum_fault+0x68/0x70
[Thu Feb 16 02:44:30 2023] __skb_checksum_complete+0x10c/0x114
[Thu Feb 16 02:44:30 2023] nf_ip_checksum+0x84/0x160
[Thu Feb 16 02:44:30 2023] nf_checksum+0x54/0x64
[Thu Feb 16 02:44:30 2023] nf_conntrack_tcp_packet+0xb0c/0x1820 [nf_conntrack]
[Thu Feb 16 02:44:30 2023] nf_conntrack_in+0xec/0x850 [nf_conntrack]
[Thu Feb 16 02:44:30 2023] ipv4_conntrack_in+0x20/0x30 [nf_conntrack]
[Thu Feb 16 02:44:30 2023] nf_hook_slow+0x54/0xf4
[Thu Feb 16 02:44:30 2023] nf_hook_slow_list+0x88/0x11c
[Thu Feb 16 02:44:30 2023] ip_sublist_rcv+0x1e4/0x1f0
[Thu Feb 16 02:44:30 2023] ip_list_rcv+0xf8/0x19c
[Thu Feb 16 02:44:30 2023] __netif_receive_skb_list_core+0x190/0x220
[Thu Feb 16 02:44:30 2023] netif_receive_skb_list_internal+0x194/0x2b0
[Thu Feb 16 02:44:30 2023] napi_complete_done+0x70/0x210
[Thu Feb 16 02:44:30 2023] bcmgenet_rx_poll+0x3a0/0x43c
[Thu Feb 16 02:44:30 2023] __napi_poll+0x40/0x214
[Thu Feb 16 02:44:30 2023] net_rx_action+0x344/0x3c0
[Thu Feb 16 02:44:30 2023] __do_softirq+0x198/0x4f0
[Thu Feb 16 02:44:30 2023] ____do_softirq+0x18/0x24
[Thu Feb 16 02:44:30 2023] call_on_irq_stack+0x2c/0x60
[Thu Feb 16 02:44:30 2023] do_softirq_own_stack+0x24/0x3c
[Thu Feb 16 02:44:30 2023] __irq_exit_rcu+0xd4/0x120
[Thu Feb 16 02:44:30 2023] irq_exit_rcu+0x18/0x50
[Thu Feb 16 02:44:30 2023] el0_interrupt+0x54/0x100
[Thu Feb 16 02:44:30 2023] __el0_irq_handler_common+0x18/0x24
[Thu Feb 16 02:44:30 2023] el0t_32_irq_handler+0x10/0x20
[Thu Feb 16 02:44:30 2023] el0t_32_irq+0x190/0x194
fwiw there was an interesting post in an openwrt bug report regarding this afair fingering network "scaling" subsystems... which was my initial gut feeling on this due to prevalence.
i tend to agree with that, however it's occurrence or more importantly lack thereof across similar and dissimilar hardware is pointing me towards actual core/buffer??? (re)allocation for the above as key here...
just a laymans theory with nothing to substantiate so grain of salt and all of that...
Is this the right place for my bug report? I believe so.
Describe the bug
dmesg
and the syslog are flooded with errors statingeth0: hw csum failure
.To reproduce List the steps required to reproduce the issue.
Expected behaviour The error doesn't occur.
Actual behaviour The error occurs.
System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions:
cat /etc/rpi-issue
)?vcgencmd version
)?uname -a
)?Logs If applicable, add the relevant output from
dmesg
or similar.Additional context I appended
genet.skip_umac_reset=n
to my/boot/cmdline.txt
as per #4184 and the issue kept occurring. The only way to stop the flood issudo ethtool -K eth0 rx off
. The issue also occurs in kernel versions5.10.17-v7l+
and5.10.11-v7l+
. I haven't tested5.10.11-v8+
.