xbianonpi / xbian

XBMC on Raspberry Pi, Bleeding Edge
https://xbian.org
GNU General Public License v3.0
294 stars 46 forks source link

K4.4: HW CSum Failure with Bridged Setup #814

Closed airend closed 8 years ago

airend commented 8 years ago

Running latest 4.4.3 kernel, and getting weird Ethernet errors every few seconds. No issues with the regular K4.1.

[ 1596.066232] eth0: hw csum failure
[ 1596.066267] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C      4.4.3+ #2
[ 1596.066280] Hardware name: BCM2709
[ 1596.066333] [<800186d8>] (unwind_backtrace) from [<80014018>] (show_stack+0x20/0x24)
[ 1596.066366] [<80014018>] (show_stack) from [<803e51f0>] (dump_stack+0xd4/0x118)
[ 1596.066400] [<803e51f0>] (dump_stack) from [<8058bfec>] (netdev_rx_csum_fault+0x44/0x48)
[ 1596.066428] [<8058bfec>] (netdev_rx_csum_fault) from [<8057fbe4>] (__skb_checksum_complete+0xb4/0xb8)
[ 1596.066456] [<8057fbe4>] (__skb_checksum_complete) from [<8067acdc>] (ipv6_mc_validate_checksum+0xa8/0x15c)
[ 1596.066486] [<8067acdc>] (ipv6_mc_validate_checksum) from [<8057c9d0>] (skb_checksum_trimmed+0x9c/0x190)
[ 1596.066511] [<8057c9d0>] (skb_checksum_trimmed) from [<8067ae94>] (ipv6_mc_check_mld+0x104/0x348)
[ 1596.066582] [<8067ae94>] (ipv6_mc_check_mld) from [<7f063600>] (br_multicast_rcv+0x68/0xb64 [bridge])
[ 1596.066636] [<7f063600>] (br_multicast_rcv [bridge]) from [<7f05a0f0>] (br_handle_frame_finish+0x1b4/0x5b0 [bridge])
[ 1596.066689] [<7f05a0f0>] (br_handle_frame_finish [bridge]) from [<7f05a668>] (br_handle_frame+0x17c/0x950 [bridge])
[ 1596.066729] [<7f05a668>] (br_handle_frame [bridge]) from [<80589a20>] (__netif_receive_skb_core+0x388/0xb5c)
[ 1596.066759] [<80589a20>] (__netif_receive_skb_core) from [<8058c0b0>] (__netif_receive_skb+0x20/0x7c)
[ 1596.066786] [<8058c0b0>] (__netif_receive_skb) from [<8058d054>] (process_backlog+0xb4/0x16c)
[ 1596.066809] [<8058d054>] (process_backlog) from [<8058c750>] (net_rx_action+0x28c/0x3f4)
[ 1596.066833] [<8058c750>] (net_rx_action) from [<800290ac>] (__do_softirq+0x198/0x3dc)
[ 1596.066857] [<800290ac>] (__do_softirq) from [<80029690>] (irq_exit+0xdc/0x140)
[ 1596.066883] [<80029690>] (irq_exit) from [<80071148>] (__handle_domain_irq+0x70/0xc4)
[ 1596.066911] [<80071148>] (__handle_domain_irq) from [<80010a10>] (handle_IRQ+0x28/0x2c)
[ 1596.066937] [<80010a10>] (handle_IRQ) from [<80009520>] (bcm2836_arm_irqchip_handle_irq+0xb8/0xbc)
[ 1596.066965] [<80009520>] (bcm2836_arm_irqchip_handle_irq) from [<806b5ec4>] (__irq_svc+0x44/0x5c)
[ 1596.066979] Exception stack(0x80999f08 to 0x80999f50)
[ 1596.066998] 9f00:                   00000000 aefab348 00000000 00000000 80998000 8099a5dc
[ 1596.067019] 9f20: 8099a500 8099a580 806bae5c 80a05df8 80991324 80999f64 8099b4f8 80999f58
[ 1596.067034] 9f40: 80010ad0 80010ad4 60000013 ffffffff
[ 1596.067061] [<806b5ec4>] (__irq_svc) from [<80010ad4>] (arch_cpu_idle+0x34/0x4c)
[ 1596.067088] [<80010ad4>] (arch_cpu_idle) from [<80063f80>] (default_idle_call+0x34/0x48)
[ 1596.067115] [<80063f80>] (default_idle_call) from [<800641ac>] (cpu_startup_entry+0x218/0x2b4)
[ 1596.067145] [<800641ac>] (cpu_startup_entry) from [<806b07dc>] (rest_init+0x88/0x8c)
[ 1596.067176] [<806b07dc>] (rest_init) from [<80918d34>] (start_kernel+0x3dc/0x3e8)
mkreisl commented 8 years ago

Which version of kernel package are you using? ...and, does ethernet work or not? ...and, are you using eth0 in a bridge?

airend commented 8 years ago

This was linux-image-bcm2836 (4.4.3+-1457058983), but every version of 4.4.x had these errors on RPi 2. Otherwise, ignoring the flood of kernel messages, network seems to work OK, although I didn't look at performance or lost packets.

P.S. With the latest 4.4.3, load averages are better than with earlier 4.4.x, pretty much in line with K4.1 now.

P.P.S. Yes, I've been using eth0 bridged to wlan0 (4addr) for more than a year (no issues on 3.18/4.1).

mkreisl commented 8 years ago

Again, are you using a bridge? Seems that there are issues for kernels > 4.1 Btw, you can disable hw checksum for rx and/or tx by ethtool

airend commented 8 years ago

Thanks, I think I edited my post just after you edited yours ;-) I will try disabling hw checksums. Are bridging issues specific to RPi 2 on > 4.1? OpenWrt uses the same bridged/WDS/4addr setup on K3.3 without issues.

mkreisl commented 8 years ago

Don't know. All I found was one thread reporting same issues on RPi2.

airend commented 8 years ago

@mkreisl, looks like K4.5 includes a probable fix: https://github.com/torvalds/linux/commit/9b368814b336b0a1a479135eb2815edbc00efd3c. It uses the skb_postpush_rcsum() helper in https://github.com/torvalds/linux/commit/f8ffad69c9f8b8dfb0b633425d4ef4d2493ba61a (also just added in 4.5/stable). Do you mind cherry-picking these for staging?

mkreisl commented 8 years ago

@airend I applied these two commits (and a required third one) (picked them from the current 4.5 branch) and build a new kernel package for RPi2. You can download/install the package from the devel repository.

Please test it and report if it works (or not)

airend commented 8 years ago

Thank you! I installed kernel 1458666730 and the csum errors are gone (setup unchanged otherwise). May I ask what the third patch was?

mkreisl commented 8 years ago
From fdc5432a7b44ab7de17141beec19d946b9344e91 Mon Sep 17 00:00:00 2001
From: Daniel Borkmann <daniel@iogearbox.net>
Date: Thu, 7 Jan 2016 15:50:22 +0100
Subject: [PATCH] net, sched: add skb_at_tc_ingress helper

Add a skb_at_tc_ingress() as this will be needed elsewhere as well and
can hide the ugly ifdef.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
mkreisl commented 8 years ago

Funny thing is, I'm using kernel 4.4.6 at my Cubieboard2 with eth0 and wlan0 in bridge mode (hostaod) but never seen those strange messages

airend commented 8 years ago

I'm using something very similar: bridged wlan0 in 4addr mode, except managed by wpa_supplicant. I wonder if hostapd sets thing up differently; I definitely had major issues configuring as bridged client before the 2.4 supplicant, whereas hostapd always worked properly.

mkreisl commented 8 years ago

never had success with 4addr mode - tried to build a wlan-eth bridge some years ago. Now I have it running using the legacy 8192cu driver for years :smile:

airend commented 8 years ago

It's definitely working with the bridging changes in 2.4 and addif only after 4addr is on (pre-up/post-up commands). I've come to depend on this 5 GHz Wi-Fi bridge, which works better than anything else tried, short of real wire that is ;-)

mkreisl commented 8 years ago

could you please post your configuration here, being still very interested on get it work. Thanks

airend commented 8 years ago

Here's what I've been using for the past year or so:

#auto eth0
allow-hotplug eth0
iface eth0 inet manual

#auto wlan0
allow-hotplug wlan0
iface wlan0 inet manual
    pre-up iw dev wlan0 set 4addr on
    post-up brctl addif wds wlan0
    pre-down brctl delif wds wlan0
    wpa-bridge wds
    wpa-driver nl80211
    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

auto wds
iface wds inet static
    address 192.168.3.107
    netmask 255.255.255.0
    gateway 192.168.3.7
    dns-nameservers 192.168.3.7 192.168.3.3
    bridge_ports eth0
    bridge_pathcost eth0 100
    bridge_ageing 0
    bridge_maxwait 5
    bridge_fd 2