raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.11k stars 4.97k forks source link

eth0 (bcmgenet): transmit queue timed out #4485

Open atsies opened 3 years ago

atsies commented 3 years ago

Describe the bug eth0 (bcmgenet) Ethernet stops working when plugged-in and hangs until a next reboot.

To reproduce Boot Raspi OS, plug in Ethernet cable, observe kernel output.

Expected behaviour eth0 (bcmgenet) Ethernet is expected to function.

Actual behaviour eth0 (bcmgenet) Ethernet stops working, TCP/IP networking on eth0 becomes non-functional.

System Copy and paste the results of the raspinfo command in to this section. Alternatively, copy and paste a pastebin link, or add answers to the following questions: PRETTY_NAME="Raspbian GNU/Linux 10 (buster)" NAME="Raspbian GNU/Linux" VERSION_ID="10" VERSION="10 (buster)"

Logs If applicable, add the relevant output from dmesg or similar. [ 0.000000] Booting Linux on physical CPU 0x0 [ 0.000000] Linux version 5.10.17-v7l+ (dom@buildbot) (arm-linux-gnueabihf-gcc-8 (Ubuntu/Linaro 8.4.0-3ubuntu1) 8.4.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #1414 SMP Fri Apr 30 13:20:47 BST 2021 [ 0.000000] CPU: ARMv7 Processor [410fd083] revision 3 (ARMv7), cr=30c5383d [ 0.000000] CPU: div instructions available: patching division code [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache [ 0.000000] OF: fdt: Machine model: Raspberry Pi 4 Model B Rev 1.2 [ 0.000000] Memory policy: Data cache writealloc [ 0.000000] Reserved memory: created CMA memory pool at 0x000000001ec00000, size 256 MiB [ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool [ 0.000000] Zone ranges: [ 0.000000] DMA [mem 0x0000000000000000-0x000000002fffffff] [ 0.000000] Normal empty [ 0.000000] HighMem [mem 0x0000000030000000-0x00000000fbffffff] [ 0.000000] Movable zone start for each node [ 0.000000] Early memory node ranges [ 0.000000] node 0: [mem 0x0000000000000000-0x000000003b3fffff] [ 0.000000] node 0: [mem 0x0000000040000000-0x00000000fbffffff] [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000fbffffff] [ 0.000000] On node 0 totalpages: 1012736 [ 0.000000] DMA zone: 2304 pages used for memmap [ 0.000000] DMA zone: 0 pages reserved [ 0.000000] DMA zone: 196608 pages, LIFO batch:63 [ 0.000000] HighMem zone: 816128 pages, LIFO batch:63 [ 0.000000] percpu: Embedded 20 pages/cpu s50700 r8192 d23028 u81920 [ 0.000000] pcpu-alloc: s50700 r8192 d23028 u81920 alloc=20*4096 [ 0.000000] pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 1010432 [ 0.000000] Kernel command line: coherent_pool=1M 8250.nr_uarts=1 snd_bcm2835.enable_compat_alsa=0 snd_bcm2835.enable_hdmi=1 smsc95xx.macaddr=m.m.m.m vc_mem.mem_base=0x3ec00000 vc_mem.mem_size=0x4000000t [ 0.000000] Kernel parameter elevator= does not have any effect anymore. Please use sysfs to set IO scheduler for individual devices. [ 0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) [ 0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off [ 0.000000] software IO TLB: mapped [mem 0x0000000017cc0000-0x000000001bcc0000] (64MB) [ 0.000000] Memory: 3654940K/4050944K available (10240K kernel code, 1358K rwdata, 3160K rodata, 2048K init, 890K bss, 133860K reserved, 262144K cma-reserved, 3264512K highmem) [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=4, Nodes=1 [ 0.000000] ftrace: allocating 33942 entries in 67 pages [ 0.000000] ftrace: allocated 67 pages with 3 groups [ 0.000000] rcu: Hierarchical RCU implementation. [ 0.000000] Rude variant of Tasks RCU enabled. [ 0.000000] Tracing variant of Tasks RCU enabled. [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies. [ 0.000000] NR_IRQS: 16, nr_irqs: 16, preallocated irqs: 16 [ 0.000000] GIC: Using split EOI/Deactivate mode [ 0.000000] random: get_random_bytes called from start_kernel+0x3c8/0x59c with crng_init=0 [ 0.000008] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 2147483647500ns [ 0.000034] clocksource: timer: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 1911260446275 ns [ 0.000096] bcm2835: system timer (irq = 25) [ 0.000747] arch_timer: cp15 timer(s) running at 54.00MHz (phys). [ 0.000769] clocksource: arch_sys_counter: mask: 0xffffffffffffff max_cycles: 0xc743ce346, max_idle_ns: 440795203123 ns [ 0.000791] sched_clock: 56 bits at 54MHz, resolution 18ns, wraps every 4398046511102ns [ 0.000810] Switching to timer-based delay loop, resolution 18ns [ 0.001059] Console: colour dummy device 80x30 [ 0.001797] printk: console [tty1] enabled [ 0.001865] Calibrating delay loop (skipped), value calculated using timer frequency.. 108.00 BogoMIPS (lpj=540000) [ 0.001919] pid_max: default: 32768 minimum: 301 [ 0.002098] LSM: Security Framework initializing [ 0.002291] Mount-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) [ 0.002337] Mountpoint-cache hash table entries: 2048 (order: 1, 8192 bytes, linear) [ 0.003908] Disabling memory control group subsystem [ 0.004044] CPU: Testing write buffer coherency: ok [ 0.004504] CPU0: thread -1, cpu 0, socket 0, mpidr 80000000 [ 0.005706] Setting up static identity map for 0x200000 - 0x20003c [ 0.005917] rcu: Hierarchical SRCU implementation. [ 0.006836] smp: Bringing up secondary CPUs ... [ 0.008019] CPU1: thread -1, cpu 1, socket 0, mpidr 80000001 [ 0.009350] CPU2: thread -1, cpu 2, socket 0, mpidr 80000002 [ 0.010614] CPU3: thread -1, cpu 3, socket 0, mpidr 80000003 [ 0.010887] smp: Brought up 1 node, 4 CPUs [ 0.010918] SMP: Total of 4 processors activated (432.00 BogoMIPS). [ 0.010948] CPU: All CPU(s) started in HYP mode. [ 0.010975] CPU: Virtualization extensions available. [ 0.011799] devtmpfs: initialized [ 0.025543] VFP support v0.3: implementor 41 architecture 3 part 40 variant 8 rev 0 [ 0.025797] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns [ 0.025849] futex hash table entries: 1024 (order: 4, 65536 bytes, linear) [ 0.032847] pinctrl core: initialized pinctrl subsystem [ 0.033910] NET: Registered protocol family 16 [ 0.037667] DMA: preallocated 1024 KiB pool for atomic coherent allocations [ 0.038428] audit: initializing netlink subsys (disabled) [ 0.038709] audit: type=2000 audit(0.030:1): state=initialized audit_enabled=0 res=1 [ 0.039254] thermal_sys: Registered thermal governor 'step_wise' [ 0.039930] hw-breakpoint: found 5 (+1 reserved) breakpoint and 4 watchpoint registers. [ 0.039984] hw-breakpoint: maximum watchpoint size is 8 bytes. [ 0.040372] Serial: AMBA PL011 UART driver [ 0.073040] bcm2835-mbox fe00b880.mailbox: mailbox enabled [ 0.090899] raspberrypi-firmware soc:firmware: Attached to firmware from 2021-04-30T13:45:52, variant start [ 0.100913] raspberrypi-firmware soc:firmware: Firmware hash is d7f29d96450abfc77cd6cf011af1faf1e03e5e56 [ 0.145705] bcm2835-dma fe007000.dma: DMA legacy API manager, dmachans=0x1 [ 0.149823] vgaarb: loaded [ 0.150281] SCSI subsystem initialized [ 0.150506] usbcore: registered new interface driver usbfs [ 0.150593] usbcore: registered new interface driver hub [ 0.150680] usbcore: registered new device driver usb [ 0.151389] usb_phy_generic phy: supply vcc not found, using dummy regulator [ 0.153248] clocksource: Switched to clocksource arch_sys_counter [ 1.153198] VFS: Disk quotas dquot_6.6.0 [ 1.153358] VFS: Dquot-cache hash table entries: 1024 (order 0, 4096 bytes) [ 1.153535] FS-Cache: Loaded [ 1.153738] CacheFiles: Loaded [ 1.163840] NET: Registered protocol family 2 [ 1.164679] tcp_listen_portaddr_hash hash table entries: 512 (order: 0, 6144 bytes, linear) [ 1.164838] TCP established hash table entries: 8192 (order: 3, 32768 bytes, linear) [ 1.164919] TCP bind hash table entries: 8192 (order: 4, 65536 bytes, linear) [ 1.165002] TCP: Hash tables configured (established 8192 bind 8192) [ 1.165162] UDP hash table entries: 512 (order: 2, 16384 bytes, linear) [ 1.165218] UDP-Lite hash table entries: 512 (order: 2, 16384 bytes, linear) [ 1.165494] NET: Registered protocol family 1 [ 1.166230] RPC: Registered named UNIX socket transport module. [ 1.166263] RPC: Registered udp transport module. [ 1.166291] RPC: Registered tcp transport module. [ 1.166318] RPC: Registered tcp NFSv4.1 backchannel transport module. [ 1.166355] PCI: CLS 0 bytes, default 64 [ 1.169241] Initialise system trusted keyrings [ 1.169487] workingset: timestamp_bits=14 max_order=20 bucket_order=6 [ 1.178063] zbud: loaded [ 1.179974] FS-Cache: Netfs 'nfs' registered for caching [ 1.180761] NFS: Registering the id_resolver key type [ 1.180817] Key type id_resolver registered [ 1.180846] Key type id_legacy registered [ 1.181013] nfs4filelayout_init: NFSv4 File Layout Driver Registering... [ 1.182026] Key type asymmetric registered [ 1.182058] Asymmetric key parser 'x509' registered [ 1.182263] bounce: pool size: 64 pages [ 1.182318] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 249) [ 1.182547] io scheduler mq-deadline registered [ 1.182578] io scheduler kyber registered [ 1.186870] brcm-pcie fd500000.pcie: host bridge /scb/pcie@7d500000 ranges: [ 1.186921] brcm-pcie fd500000.pcie: No bus range found for /scb/pcie@7d500000, using [bus 00-ff] [ 1.187022] brcm-pcie fd500000.pcie: MEM 0x0600000000..0x063fffffff -> 0x00c0000000 [ 1.187131] brcm-pcie fd500000.pcie: IB MEM 0x0000000000..0x00bfffffff -> 0x0400000000 [ 1.245351] brcm-pcie fd500000.pcie: link up, 5.0 GT/s PCIe x1 (SSC) [ 1.245744] brcm-pcie fd500000.pcie: PCI host bridge to bus 0000:00 [ 1.245781] pci_bus 0000:00: root bus resource [bus 00-ff] [ 1.245816] pci_bus 0000:00: root bus resource [mem 0x600000000-0x63fffffff] (bus address [0xc0000000-0xffffffff]) [ 1.245924] pci 0000:00:00.0: [14e4:2711] type 01 class 0x060400 [ 1.246179] pci 0000:00:00.0: PME# supported from D0 D3hot [ 1.249674] PCI: bus0: Fast back to back transfers disabled [ 1.250026] pci 0000:01:00.0: [1106:3483] type 00 class 0x0c0330 [ 1.250167] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00000fff 64bit] [ 1.250616] pci 0000:01:00.0: PME# supported from D0 D3cold [ 1.254138] PCI: bus1: Fast back to back transfers disabled [ 1.254251] pci 0000:00:00.0: BAR 8: assigned [mem 0x600000000-0x6000fffff] [ 1.254291] pci 0000:01:00.0: BAR 0: assigned [mem 0x600000000-0x600000fff 64bit] [ 1.254393] pci 0000:00:00.0: PCI bridge to [bus 01] [ 1.254433] pci 0000:00:00.0: bridge window [mem 0x600000000-0x6000fffff] [ 1.254847] pcieport 0000:00:00.0: enabling device (0140 -> 0142) [ 1.255086] pcieport 0000:00:00.0: PME: Signaling with IRQ 63 [ 1.260717] Serial: 8250/16550 driver, 1 ports, IRQ sharing enabled [ 1.261627] bcm2835-aux-uart fe215040.serial: there is not valid maps for state default [ 1.264073] iproc-rng200 fe104000.rng: hwrng registered [ 1.264396] vc-mem: phys_addr:0x00000000 mem_base=0x3ec00000 mem_size:0x40000000(1024 MiB) [ 1.265275] gpiomem-bcm2835 fe200000.gpiomem: Initialised: Registers at 0xfe200000 [ 1.277270] brd: module loaded [ 1.289357] loop: module loaded [ 1.291006] Loading iSCSI transport class v2.0-870. [ 1.293515] libphy: Fixed MDIO Bus: probed [ 1.295222] bcmgenet fd580000.ethernet: GENET 5.0 EPHY: 0x0000 [ 1.313326] libphy: bcmgenet MII bus: probed [ 1.393416] unimac-mdio unimac-mdio.-19: Broadcom UniMAC MDIO bus [ 1.394537] usbcore: registered new interface driver r8152 [ 1.394620] usbcore: registered new interface driver lan78xx [ 1.394697] usbcore: registered new interface driver smsc95xx [ 1.396299] xhci_hcd 0000:01:00.0: enabling device (0140 -> 0142) [ 1.396474] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.396524] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 1 [ 1.399880] xhci_hcd 0000:01:00.0: hcc params 0x002841eb hci version 0x100 quirks 0x0000030000000890 [ 1.401215] usb usb1: New USB device found, idVendor=1d6b, idProduct=0002, bcdDevice= 5.10 [ 1.401256] usb usb1: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 1.401292] usb usb1: Product: xHCI Host Controller [ 1.401322] usb usb1: Manufacturer: Linux 5.10.17-v7l+ xhci-hcd [ 1.401353] usb usb1: SerialNumber: 0000:01:00.0 [ 1.402098] hub 1-0:1.0: USB hub found [ 1.402224] hub 1-0:1.0: 1 port detected [ 1.402912] xhci_hcd 0000:01:00.0: xHCI Host Controller [ 1.402955] xhci_hcd 0000:01:00.0: new USB bus registered, assigned bus number 2 [ 1.403000] xhci_hcd 0000:01:00.0: Host supports USB 3.0 SuperSpeed [ 1.403592] usb usb2: New USB device found, idVendor=1d6b, idProduct=0003, bcdDevice= 5.10 [ 1.403631] usb usb2: New USB device strings: Mfr=3, Product=2, SerialNumber=1 [ 1.403667] usb usb2: Product: xHCI Host Controller [ 1.403697] usb usb2: Manufacturer: Linux 5.10.17-v7l+ xhci-hcd [ 1.403727] usb usb2: SerialNumber: 0000:01:00.0 [ 1.404462] hub 2-0:1.0: USB hub found [ 1.404559] hub 2-0:1.0: 4 ports detected [ 1.406224] dwc_otg: version 3.00a 10-AUG-2012 (platform bus) [ 1.406549] dwc_otg: FIQ enabled [ 1.406564] dwc_otg: NAK holdoff enabled [ 1.406579] dwc_otg: FIQ split-transaction FSM enabled [ 1.406596] Module dwc_common_port init [ 1.407033] usbcore: registered new interface driver uas [ 1.407160] usbcore: registered new interface driver usb-storage [ 1.407368] mousedev: PS/2 mouse device common for all mice [ 1.409036] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer [ 1.412448] sdhci: Secure Digital Host Controller Interface driver [ 1.412482] sdhci: Copyright(c) Pierre Ossman [ 1.413187] mmc-bcm2835 fe300000.mmcnr: could not get clk, deferring probe [ 1.413840] sdhci-pltfm: SDHCI platform and OF driver helper [ 1.417993] ledtrig-cpu: registered to indicate activity on CPUs [ 1.418352] hid: raw HID events driver (C) Jiri Kosina [ 1.418573] usbcore: registered new interface driver usbhid [ 1.418603] usbhid: USB HID core driver [ 1.423870] Initializing XFRM netlink socket [ 1.423932] NET: Registered protocol family 17 [ 1.424070] Key type dns_resolver registered [ 1.424438] Registering SWP/SWPB emulation handler [ 1.424635] registered taskstats version 1 [ 1.424678] Loading compiled-in X.509 certificates [ 1.425512] Key type ._fscrypt registered [ 1.425630] Key type .fscrypt registered [ 1.425657] Key type fscrypt-provisioning registered [ 1.436986] uart-pl011 fe201000.serial: there is not valid maps for state default [ 1.437310] uart-pl011 fe201000.serial: cts_event_workaround enabled [ 1.437399] fe201000.serial: ttyAMA0 at MMIO 0xfe201000 (irq = 36, base_baud = 0) is a PL011 rev2 [ 1.443890] bcm2835-aux-uart fe215040.serial: there is not valid maps for state default [ 1.444661] printk: console [ttyS0] disabled [ 1.444800] fe215040.serial: ttyS0 at MMIO 0xfe215040 (irq = 37, base_baud = 62500000) is a 16550 [ 2.714119] printk: console [ttyS0] enabled [ 2.720542] bcm2835-power bcm2835-power: Broadcom BCM2835 power domains driver [ 2.728991] mmc-bcm2835 fe300000.mmcnr: mmc_debug:0 mmc_debug2:0 [ 2.735142] mmc-bcm2835 fe300000.mmcnr: DMA channel allocated [ 2.770733] of_cfs_init [ 2.773394] of_cfs_init: OK [ 2.793002] mmc1: queuing unknown CIS tuple 0x80 (2 bytes) [ 2.800338] mmc1: queuing unknown CIS tuple 0x80 (3 bytes) [ 2.807652] mmc1: queuing unknown CIS tuple 0x80 (3 bytes) [ 2.813306] mmc0: SDHCI controller on fe340000.emmc2 [fe340000.emmc2] using ADMA [ 2.816275] mmc1: queuing unknown CIS tuple 0x80 (7 bytes) [ 2.821667] Waiting for root device PARTUUID=bb9841f2-02... [ 2.828130] mmc1: queuing unknown CIS tuple 0x80 (3 bytes) [ 2.878663] random: fast init done [ 2.906735] mmc1: new high speed SDIO card at address 0001 [ 2.945275] mmc0: new ultra high speed DDR50 SDHC card at address aaaa [ 2.952787] mmcblk0: mmc0:aaaa SD32G 29.7 GiB [ 2.959372] mmcblk0: p1 p2 [ 2.982793] EXT4-fs (mmcblk0p2): mounted filesystem with ordered data mode. Opts: (null) [ 2.991155] VFS: Mounted root (ext4 filesystem) readonly on device 179:2. [ 2.993291] usb 1-1: new high-speed USB device number 2 using xhci_hcd [ 3.009142] devtmpfs: mounted [ 3.020911] Freeing unused kernel memory: 2048K [ 3.063603] Run /sbin/init as init process [ 3.067777] with arguments: [ 3.067791] /sbin/init [ 3.067805] with environment: [ 3.067820] HOME=/ [ 3.067834] TERM=linux [ 3.205916] usb 1-1: New USB device found, idVendor=2109, idProduct=3431, bcdDevice= 4.21 [ 3.214300] usb 1-1: New USB device strings: Mfr=0, Product=1, SerialNumber=0 [ 3.221584] usb 1-1: Product: USB2.0 Hub [ 3.227744] hub 1-1:1.0: USB hub found [ 3.232100] hub 1-1:1.0: 4 ports detected [ 3.467571] systemd[1]: System time before build time, advancing clock. [ 3.573728] NET: Registered protocol family 10 [ 3.579554] Segment Routing with IPv6 [ 3.641495] systemd[1]: systemd 241 running in system mode. (+PAM +AUDIT +SELINUX +IMA +APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ +LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN2 ) [ 3.665106] systemd[1]: Detected architecture arm. [ 3.717537] systemd[1]: Set hostname to . [ 4.335341] random: systemd: uninitialized urandom read (16 bytes read) [ 4.351816] random: systemd: uninitialized urandom read (16 bytes read) [ 4.359038] systemd[1]: Listening on initctl Compatibility Named Pipe. [ 4.367002] random: systemd: uninitialized urandom read (16 bytes read) [ 4.374420] systemd[1]: Listening on Journal Audit Socket. [ 4.383752] systemd[1]: Created slice system-systemd\x2dfsck.slice. [ 4.391145] systemd[1]: Listening on fsck to fsckd communication Socket. [ 4.399659] systemd[1]: Created slice system-serial\x2dgetty.slice. [ 4.407045] systemd[1]: Listening on udev Kernel Socket. [ 4.413143] systemd[1]: Started Forward Password Requests to Wall Directory Watch. [ 5.058726] EXT4-fs (mmcblk0p2): re-mounted. Opts: (null) [ 5.145397] systemd-journald[117]: Received request to flush runtime journal from PID 1 [ 5.685624] mc: Linux media interface: v0.10 [ 5.713374] rpivid-mem feb00000.hevc-decoder: rpivid-hevcmem initialised: Registers at 0xfeb00000 length 0x00010000 [ 5.714018] rpivid-mem feb10000.rpivid-local-intc: rpivid-intcmem initialised: Registers at 0xfeb10000 length 0x00001000 [ 5.717055] rpivid-mem feb20000.h264-decoder: rpivid-h264mem initialised: Registers at 0xfeb20000 length 0x00010000 [ 5.717650] rpivid-mem feb30000.vp9-decoder: rpivid-vp9mem initialised: Registers at 0xfeb30000 length 0x00010000 [ 5.719761] vc_sm_cma: module is from the staging directory, the quality is unknown, you have been warned. [ 5.721922] bcm2835_vc_sm_cma_probe: Videocore shared memory driver

[ 5.735358] [vc_sm_connected_init]: installed successfully [ 5.737497] videodev: Linux video capture interface: v2.00 [ 5.771986] snd_bcm2835: module is from the staging directory, the quality is unknown, you have been warned. [ 5.806900] bcm2835_audio bcm2835_audio: card created with 8 channels [ 5.843703] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned. [ 5.853568] bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned. [ 5.879180] bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned. [ 5.888155] bcm2835_isp: module is from the staging directory, the quality is unknown, you have been warned. [ 5.915968] bcm2835-isp bcm2835-isp: Device node output[0] registered as /dev/video13 [ 5.917216] bcm2835-isp bcm2835-isp: Device node capture[0] registered as /dev/video14 [ 5.917822] bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned. [ 5.933417] bcm2835-isp bcm2835-isp: Device node capture[1] registered as /dev/video15 [ 5.937662] bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video16 [ 5.937694] bcm2835-isp bcm2835-isp: Register output node 0 with media controller [ 5.937719] bcm2835-isp bcm2835-isp: Register capture node 1 with media controller [ 5.937743] bcm2835-isp bcm2835-isp: Register capture node 2 with media controller [ 5.937765] bcm2835-isp bcm2835-isp: Register capture node 3 with media controller [ 5.939520] bcm2835-isp bcm2835-isp: Loaded V4L2 bcm2835-isp [ 5.999456] bcm2835-codec bcm2835-codec: Device registered as /dev/video10 [ 5.999501] bcm2835-codec bcm2835-codec: Loaded V4L2 decode [ 6.068578] cfg80211: Loading compiled-in X.509 certificates for regulatory database [ 6.086165] bcm2835-codec bcm2835-codec: Device registered as /dev/video11 [ 6.086225] bcm2835-codec bcm2835-codec: Loaded V4L2 encode [ 6.108356] bcm2835-codec bcm2835-codec: Device registered as /dev/video12 [ 6.108414] bcm2835-codec bcm2835-codec: Loaded V4L2 isp [ 6.175908] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' [ 6.227138] [drm] Initialized v3d 1.0.0 20180419 for fec00000.v3d on minor 0 [ 6.228172] brcmfmac: F1 signature read @0x18000000=0x15264345 [ 6.247012] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6 [ 6.248040] usbcore: registered new interface driver brcmfmac [ 6.279684] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43455-sdio.raspberrypi,4-model-b.txt failed with error -2 [ 6.300939] [drm] No displays found. Consider forcing hotplug if HDMI is attached [ 6.301065] vc4-drm gpu: bound fe600000.firmwarekms (ops vc4_fkms_ops [vc4]) [ 6.301491] [drm] Initialized vc4 0.0.0 20140616 for gpu on minor 1 [ 6.518076] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6 [ 6.527769] brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Jan 4 2021 19:56:29 version 7.45.229 (617f1f5 CY) FWID 01-2dbd9d2e [ 8.330066] random: crng init done [ 8.330085] random: 7 urandom warning(s) missed due to ratelimiting [ 8.380795] uart-pl011 fe201000.serial: no DMA platform data [ 8.453187] 8021q: 802.1Q VLAN Support v1.8 [ 8.623298] Adding 102396k swap on /var/swap. Priority:-2 extents:1 across:102396k SSFS [ 8.873708] bcmgenet fd580000.ethernet: configuring instance for external RGMII (RX delay) [ 8.873983] bcmgenet fd580000.ethernet eth0: Link is Down [ 13.033579] bcmgenet fd580000.ethernet eth0: Link is Up - 1Gbps/Full - flow control rx/tx [ 13.033635] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready [ 15.185707] Bluetooth: Core ver 2.22 [ 15.185792] NET: Registered protocol family 31 [ 15.185804] Bluetooth: HCI device and connection manager initialized [ 15.185827] Bluetooth: HCI socket layer initialized [ 15.185844] Bluetooth: L2CAP socket layer initialized [ 15.185884] Bluetooth: SCO socket layer initialized [ 15.194729] Bluetooth: HCI UART driver ver 2.3 [ 15.194746] Bluetooth: HCI UART protocol H4 registered [ 15.194823] Bluetooth: HCI UART protocol Three-wire (H5) registered [ 15.195062] Bluetooth: HCI UART protocol Broadcom registered [ 15.373726] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 15.373735] Bluetooth: BNEP filters: protocol multicast [ 15.373747] Bluetooth: BNEP socket layer initialized [ 18.953254] ------------[ cut here ]------------ [ 18.953305] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x308/0x30c [ 18.953329] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 3 timed out [ 18.953351] Modules linked in: cmac bnep hci_uart btbcm bluetooth ecdh_generic ecc 8021q garp stp llc vc4 brcmfmac brcmutil cec v3d drm_kms_helper gpu_sched sha256_generic cfg80211 rfkill drm bcm2835_cod6 [ 18.954653] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G C 5.10.17-v7l+ #1414 [ 18.954666] Hardware name: BCM2711 [ 18.954679] Backtrace: [ 18.954712] [] (dump_backtrace) from [] (show_stack+0x20/0x24) [ 18.954727] r7:ffffffff r6:00000000 r5:60000113 r4:c12e69fc [ 18.954747] [] (show_stack) from [] (dump_stack+0xcc/0xf8) [ 18.954767] [] (dump_stack) from [] (warn+0xfc/0x114) [ 18.954784] r10:c133c8b8 r9:00000009 r8:c0a46634 r7:000001bb r6:00000009 r5:c0a46634 [ 18.954798] r4:c0ea7698 r3:c1205094 [ 18.954817] [] (warn) from [] (warn_slowpath_fmt+0xa4/0xd8) [ 18.954830] r7:000001bb r6:c0ea7698 r5:c1205048 r4:c0ea765c [ 18.954849] [] (warn_slowpath_fmt) from [] (dev_watchdog+0x308/0x30c) [ 18.954862] r9:eff47540 r8:c2280000 r7:c1203d00 r6:c21dab00 r5:c22802a8 r4:00000003 [ 18.954883] [] (dev_watchdog) from [] (call_timer_fn+0x40/0x1bc) [ 18.954899] r8:c193be04 r7:ffff9238 r6:c0a4632c r5:00000100 r4:c22802a8 [ 18.954919] [] (call_timer_fn) from [] (run_timer_softirq+0x5b0/0x698) [ 18.954934] r8:c193be04 r7:00000000 r6:c22802a8 r5:ffff9238 r4:00000000 [ 18.954954] [] (run_timer_softirq) from [] (do_softirq+0x198/0x49c) [ 18.954970] r10:00000282 r9:ffffe000 r8:c1810800 r7:00000100 r6:00000001 r5:00000002 [ 18.954982] r4:c1203084 [ 18.955002] [] (__do_softirq) from [] (irq_exit+0xd0/0xf8) [ 18.955017] r10:c0e1edfc r9:c193a000 r8:c1810800 r7:00000001 r6:00000000 r5:00000000 [ 18.955026] r4:ffffe000 [ 18.955045] [] (irq_exit) from [] (handle_domain_irq+0x70/0xc4) [ 18.955056] r5:00000000 r4:c1094d10 [ 18.955072] [] (handle_domain_irq) from [] (gic_handle_irq+0x90/0xa4) [ 18.955086] r9:c193a000 r8:c1094d1c r7:c193bf30 r6:f081400c r5:f0814000 r4:c1205b3c [ 18.955101] [] (gic_handle_irq) from [] (irq_svc+0x5c/0x7c) [ 18.955115] Exception stack(0xc193bf30 to 0xc193bf78) [ 18.955132] bf20: 00000000 0000c90c eff4f2c4 c021ac00 [ 18.955147] bf40: ffffe000 c120509c c12050e4 00000008 00000001 c133ba3f c0e1edfc c193bf8c [ 18.955164] bf60: c193bf90 c193bf80 c02088c0 c02088c4 60000013 ffffffff [ 18.955181] r9:c193a000 r8:00000001 r7:c193bf64 r6:ffffffff r5:60000013 r4:c02088c4 [ 18.955202] [] (arch_cpu_idle) from [] (default_idle_call+0x4c/0x118) [ 18.955220] [] (default_idle_call) from [] (do_idle+0x118/0x168) [ 18.955237] [] (do_idle) from [] (cpu_startup_entry+0x28/0x30) [ 18.955253] r10:00000000 r9:410fd083 r8:00007000 r7:c13543a0 r6:30c0387d r5:00000003 [ 18.955265] r4:00000091 r3:60000093 [ 18.955284] [] (cpu_startup_entry) from [] (secondary_start_kernel+0x16c/0x178) [ 18.955305] [] (secondary_start_kernel) from [<002018ac>] (0x2018ac) [ 18.955317] r5:00000000 r4:018ceac0 [ 18.955338] ---[ end trace b47be6fb4ffe5036 ]---

Additional context Add any other relevant context for the problem. kern.log attached. kern.log

mdevaev commented 3 years ago

Perhaps related: #3401

mdevaev commented 3 years ago

@atsies please show ifconfig also. Maybe you have some unusual network configuration.

2all: the user informed me that the bug was reproduced on two different boards.

atsies commented 3 years ago

eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 169.254.166.156 netmask 255.255.0.0 broadcast 169.254.255.255 inet6 fe80::68d7:f1ed:107b:9c79 prefixlen 64 scopeid 0x20 ether dc:a6:32:a0:ab:a2 txqueuelen 1000 (Ethernet) RX packets 3608 bytes 406771 (397.2 KiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 3 bytes 580 (580.0 B) TX errors 144 dropped 0 overruns 0 carrier 0 collisions 0

Interface is expected to obtain a DHCP lease from the server, but fails before that. Thus the obtained address is link-local.

atsies commented 3 years ago

Transmission queue(s) remain locked, the number of TX-ERR increments: Iface MTU RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg eth0 1500 8118 0 0 0 3 321 0 0 BMRU

atsies commented 3 years ago

ethtool -S eth0 NIC statistics: rx_packets: 13652 tx_packets: 3 rx_bytes: 1514002 tx_bytes: 580 rx_errors: 0 tx_errors: 551 rx_dropped: 0 tx_dropped: 0 multicast: 7922 rx_64_octets: 67934 rx_65_127_oct: 10143 rx_128_255_oct: 1965 rx_256_511_oct: 468 rx_512_1023_oct: 0 rx_1024_1518_oct: 0 rx_vlan_1519_1522_oct: 0 rx_1522_2047_oct: 0 rx_2048_4095_oct: 0 rx_4096_9216_oct: 0 rx_pkts: 80510 rx_bytes: 5860512 rx_multicast: 74761 rx_broadcast: 5730 rx_fcs: 0 rx_control: 65929 rx_pause: 65929 rx_unknown: 0 rx_align: 0 rx_outrange: 0 rx_code: 0 rx_carrier: 0 rx_oversize: 0 rx_jabber: 0 rx_mtu_err: 0 rx_good_pkts: 80510 rx_unicast: 19 rx_ppp: 0 rx_crc: 0 tx_64_octets: 0 tx_65_127_oct: 0 tx_128_255_oct: 0 tx_256_511_oct: 0 tx_512_1023_oct: 0 tx_1024_1518_oct: 0 tx_vlan_1519_1522_oct: 0 tx_1522_2047_oct: 0 tx_2048_4095_oct: 0 tx_4096_9216_oct: 0 tx_pkts: 0 tx_multicast: 0 tx_broadcast: 0 tx_pause: 0 tx_control: 0 tx_fcs_err: 0 tx_oversize: 0 tx_defer: 0 tx_excess_defer: 0 tx_single_col: 0 tx_multi_col: 0 tx_late_col: 0 tx_excess_col: 0 tx_frags: 0 tx_total_col: 0 tx_jabber: 0 tx_bytes: 0 tx_good_pkts: 0 tx_unicast: 0 rx_runt_pkts: 0 rx_runt_valid_fcs: 0 rx_runt_inval_fcs_align: 0 rx_runt_bytes: 0 rbuf_ovflow_cnt: 0 rbuf_err_cnt: 0 mdf_err_cnt: 4743 alloc_rx_buff_failed: 0 rx_dma_failed: 0 tx_dma_failed: 0 tx_realloc_tsb: 0 tx_realloc_tsb_failed: 0 txq0_packets: 0 txq0_bytes: 0 rxq0_bytes: 0 rxq0_packets: 0 rxq0_errors: 0 rxq0_dropped: 0 txq1_packets: 0 txq1_bytes: 0 rxq1_bytes: 0 rxq1_packets: 0 rxq1_errors: 0 rxq1_dropped: 0 txq2_packets: 1 txq2_bytes: 384 rxq2_bytes: 0 rxq2_packets: 0 rxq2_errors: 0 rxq2_dropped: 0 txq3_packets: 1 txq3_bytes: 86 rxq3_bytes: 0 rxq3_packets: 0 rxq3_errors: 0 rxq3_dropped: 0 txq16_packets: 1 txq16_bytes: 110 rxq16_bytes: 1514002 rxq16_packets: 13652 rxq16_errors: 0 rxq16_dropped: 0

atsies commented 3 years ago

Ordered a 3'rd RPi4 B rev. 1.2 . The same issue. It looks like the network traffic is the cause of the problem. After switching to another network the RPi ethernet stopped hanging. On switching back to the 1st network, it hangs again regularly. Unfortunately, I am not yet able to record the traffic on Ethernet port to see any anomalies.

graysky2 commented 2 years ago

I am experiencing this on RPi4B running OpenWRT which does not use the RPi Foundation kernel, it is mainline. Internal NIC is attached to an access point and a TPLink UE300 is attached to one of the USB3.0 ports which is attached to the WAN.

So far, I have only seen this on the 5.10.84 kernel. I downgraded to 5.10.83 and am waiting to see if it pops up.

EDIT on 25-Jan-2022: also experienced on later kernels/should also mention this board is overclocked yet stable by ever test I have thrown at it. I have since removed the overclock and have yet to see this bug triggered. Will update with a fresh comment if I see it. The bug manifested without overclocking so the striked out edit was a red herring .

I did find some open issues but no solutions. Keyword searching "PID: 0 at net/sched/sch_generic.c:467 dev_watchdog" Links:

This one claims to have been fixed:

[ 2688.031694] ------------[ cut here ]------------
[ 2688.036355] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[ 2688.043122] WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x2c8/0x2d0
[ 2688.051388] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat brcmfmac xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT wireguard pppox ppp_generic nf_nat nf_flow_table nf_conntrack libchacha20poly1305 libblake2s ipt_REJECT chacha_neon cfg80211 ax88179_178a xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_quota xt_pkttype xt_physdev xt_owner xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_cgroup xt_addrtype xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbnet usbhid slhc sch_cake rtl8150 r8152 poly1305_neon nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 macvlan libcurve25519_generic libchacha libblake2s_generic iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hid_generic crc_ccitt compat brcmutil br_netfilter sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred
[ 2688.051700]  act_gact snd_bcm2835(C) hid evdev nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb ip6_udp_tunnel udp_tunnel veth snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore nls_utf8 vfat fat nls_iso8859_1 nls_cp437
[ 2688.168412] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G         C        5.10.84 #0
[ 2688.175807] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[ 2688.181642] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[ 2688.187651] pc : dev_watchdog+0x2c8/0x2d0
[ 2688.191659] lr : dev_watchdog+0x2c8/0x2d0
[ 2688.195664] sp : ffffffc010b43d70
[ 2688.198975] x29: ffffffc010b43d70 x28: 0000000000000140 
[ 2688.204291] x27: 00000000ffffffff x26: 0000000000000000 
[ 2688.209606] x25: 0000000000000003 x24: 0000000000000000 
[ 2688.214921] x23: ffffff8040c40480 x22: 0000000000000001 
[ 2688.220235] x21: ffffffc0109b6000 x20: ffffff8040c40000 
[ 2688.225549] x19: 0000000000000000 x18: 0000000000000116 
[ 2688.230863] x17: 0001de7b88ffd018 x16: 0000000000000009 
[ 2688.236177] x15: ffffffc0109c7790 x14: 0000000000000342 
[ 2688.241492] x13: 0000000000000116 x12: ffffffc010b43a48 
[ 2688.246806] x11: ffffffc010a1f790 x10: 00000000fffff000 
[ 2688.252120] x9 : ffffffc010a1f790 x8 : 0000000000000000 
[ 2688.257434] x7 : ffffffc0109c7790 x6 : 0000000000000008 
[ 2688.262748] x5 : 0000000000000003 x4 : 0000000000000000 
[ 2688.268061] x3 : ffffffc0109b8f8c x2 : 0000000000000004 
[ 2688.273375] x1 : 0000000000000005 x0 : 000000000000003c 
[ 2688.278690] Call trace:
[ 2688.281136]  dev_watchdog+0x2c8/0x2d0
[ 2688.284801]  call_timer_fn.constprop.0+0x24/0x80
[ 2688.289418]  __run_timers.part.0+0x240/0x280
[ 2688.293688]  run_timer_softirq+0x3c/0x74
[ 2688.297609]  _stext+0x124/0x290
[ 2688.300751]  irq_exit+0xa0/0xf0
[ 2688.303893]  __handle_domain_irq+0x80/0xe0
[ 2688.307992]  gic_handle_irq+0x78/0xa0
[ 2688.311652]  el1_irq+0xcc/0x180
[ 2688.314793]  arch_cpu_idle+0x18/0x30
[ 2688.318369]  default_idle_call+0x20/0x70
[ 2688.322292]  do_idle+0x21c/0x280
[ 2688.325518]  cpu_startup_entry+0x28/0x60
[ 2688.329440]  secondary_start_kernel+0x128/0x150
[ 2688.333968] ---[ end trace ddbc004725827c20 ]---
johnp789 commented 2 years ago

This happened last night on my 2GB Pi 4 Model B running an Arch Linux ARM aarch64 kernel. This Pi 4B serves as a router, with the eth0 interface serving as the "LAN" port and connected to an unmanaged gigabit switch along with a MacBook Pro and a Raspberry Pi 3 Model B. The "WAN" port is a USB 3.0 GbE interface.

$ uname -a
Linux pi4-1 5.11.4-1-ARCH #1 SMP Sun Mar 7 23:46:10 UTC 2021 aarch64 GNU/Linux

$ uptime
 01:15:59 up 47 days, 10:56,  1 user,  load average: 0.00, 0.01, 0.00

$ ifconfig eth0
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.16.1  netmask 255.255.252.0  broadcast 192.168.19.255
        inet6 fe80::dea6:aaaa:aaaa:aaaa  prefixlen 64  scopeid 0x20<link>
        ether dc:a6:aa:aa:aa:aa  txqueuelen 1000  (Ethernet)
        RX packets 36926543  bytes 12922639839 (12.0 GiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 58183599  bytes 52461760482 (48.8 GiB)
        TX errors 6097  dropped 0 overruns 0  carrier 0  collisions 0

[4043938.220080] pi4-1 kernel: ------------[ cut here ]------------
[4043938.224979] pi4-1 kernel: NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 1 timed out
[4043938.232009] pi4-1 kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:442 dev_watchdog+0x310/0x320
[4043938.240584] pi4-1 kernel: Modules linked in: nft_fib_ipv6 nft_fib_ipv4 nft_fib overlay nft_masq nft_nat nft_chain_nat nf_nat nf_log_ipv6 nf_log_ipv4 nf_log_common nft_log nft_reject_inet nf_reject_ip>[4043938.306760] pi4-1 kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.11.4-1-ARCH #1
[4043938.314967] pi4-1 kernel: Hardware name: Raspberry Pi 4 Model B (DT)
[4043938.320349] pi4-1 kernel: pstate: 20000005 (nzCv daif -PAN -UAO -TCO BTYPE=--)
[4043938.326615] pi4-1 kernel: pc : dev_watchdog+0x310/0x320
[4043938.330854] pi4-1 kernel: lr : dev_watchdog+0x310/0x320
[4043938.335092] pi4-1 kernel: sp : ffff800010003d40
[4043938.338623] pi4-1 kernel: x29: ffff800010003d40 x28: 0000000000000004 
[4043938.344186] pi4-1 kernel: x27: 0000000000000000 x26: 0000000000000001 
[4043938.349748] pi4-1 kernel: x25: 0000000000000140 x24: 00000000ffffffff 
[4043938.355309] pi4-1 kernel: x23: 0000000000000000 x22: ffff000042698480 
[4043938.360871] pi4-1 kernel: x21: ffff800011d37000 x20: ffff000042698000 
[4043938.366433] pi4-1 kernel: x19: 0000000000000001 x18: 0000000000000030 
[4043938.371994] pi4-1 kernel: x17: 0000000000000000 x16: 0000000000000000 
[4043938.377556] pi4-1 kernel: x15: ffff800011d45ff0 x14: ffff800011d59670 
[4043938.383118] pi4-1 kernel: x13: 0000000000000000 x12: 0000000000000942 
[4043938.388679] pi4-1 kernel: x11: 0000000000000316 x10: ffff800011e09670 
[4043938.394241] pi4-1 kernel: x9 : 00000000ffffe000 x8 : 80000000ffffe000 
[4043938.399803] pi4-1 kernel: x7 : ffff800011d59670 x6 : 0000000000000001 
[4043938.405365] pi4-1 kernel: x5 : 0000000000000000 x4 : 0000000000000000 
[4043938.410926] pi4-1 kernel: x3 : ffff800011d3f310 x2 : 0000000000000004 
[4043938.416488] pi4-1 kernel: x1 : 56de5a16e8fc0800 x0 : 0000000000000000 
[4043938.422050] pi4-1 kernel: Call trace:
[4043938.424700] pi4-1 kernel:  dev_watchdog+0x310/0x320
[4043938.428588] pi4-1 kernel:  call_timer_fn.constprop.0+0x24/0x80
[4043938.433445] pi4-1 kernel:  expire_timers+0xe4/0x10c
[4043938.437331] pi4-1 kernel:  __run_timers.part.0+0x1ac/0x1f0
[4043938.441833] pi4-1 kernel:  run_timer_softirq+0x3c/0x74
[4043938.445983] pi4-1 kernel:  _stext+0x124/0x2ac
[4043938.449340] pi4-1 kernel:  irq_exit+0xdc/0xfc
[4043938.452699] pi4-1 kernel:  __handle_domain_irq+0x80/0xe0
[4043938.457026] pi4-1 kernel:  gic_handle_irq+0x6c/0xa0
[4043938.460914] pi4-1 kernel:  el1_irq+0xb4/0x140
[4043938.464270] pi4-1 kernel:  arch_cpu_idle+0x18/0x30
[4043938.468069] pi4-1 kernel:  default_idle_call+0x24/0x6c
[4043938.472218] pi4-1 kernel:  cpuidle_idle_call+0x158/0x1ac
[4043938.476547] pi4-1 kernel:  do_idle+0xa8/0x100
[4043938.479905] pi4-1 kernel:  cpu_startup_entry+0x24/0x90
[4043938.484056] pi4-1 kernel:  rest_init+0xbc/0xcc
[4043938.487501] pi4-1 kernel:  arch_call_rest_init+0x10/0x1c
[4043938.491831] pi4-1 kernel:  start_kernel+0x3f0/0x424
[4043938.495718] pi4-1 kernel: ---[ end trace 753ccc04bb8fd67a ]---

This was followed by many lines about the tx ring 0 full.

[4044005.261046] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake
[4044007.261080] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake
[4044009.261145] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake
[4044011.261144] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake
[4044013.261174] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake

Much later, the last of these messages around the time I disconnected and reconnected the MacBook Pro. After doing this, without any intervention on the Pi 4B, traffic was able to flow again.

[4056163.431011] pi4-1 kernel: bcmgenet fd580000.ethernet eth0: bcmgenet_xmit: tx ring 0 full when queue 1 awake
graysky2 commented 2 years ago

@johnp789 - overclocking?

johnp789 commented 2 years ago

@johnp789 - overclocking?

I'm not overclocking. My config.txt has only enable_uart=1 in it.

graysky2 commented 2 years ago

I'm not overclocking. My config.txt has only enable_uart=1 in it.

Well, that was timely. I sought to test out whether or not my overclock could be a reason this bug is manifesting. After running without it (stock) for a while, the bug got triggered this morning for me. The overclock was a red herring.

[14105.382014] ------------[ cut here ]------------
[14105.386662] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[14105.393425] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x2c8/0x2d0
[14105.401688] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat brcmfmac xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT wireguard pppox ppp_generic nf_nat nf_flow_table nf_conntrack libchacha20poly1305 libblake2s ipt_REJECT chacha_neon cfg80211 cdc_ncm cdc_ether ax88179_178a xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_quota xt_pkttype xt_physdev xt_owner xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_cgroup xt_addrtype xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbnet usbhid slhc sch_cake rtl8150 r8152 poly1305_neon nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 macvlan libcurve25519_generic libchacha libblake2s_generic iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hid_generic crc_ccitt compat brcmutil br_netfilter sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit
[14105.401953]  act_mirred act_gact snd_bcm2835(C) hid evdev nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb ip6_udp_tunnel udp_tunnel veth snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore nls_utf8 vfat fat nls_iso8859_1 nls_cp437
[14105.520204] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.10.93 #0
[14105.527597] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[14105.533429] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[14105.539433] pc : dev_watchdog+0x2c8/0x2d0
[14105.543439] lr : dev_watchdog+0x2c8/0x2d0
[14105.547443] sp : ffffffc010b2bd70
[14105.550753] x29: ffffffc010b2bd70 x28: 0000000000000140 
[14105.556066] x27: 00000000ffffffff x26: 0000000000000000 
[14105.561378] x25: 0000000000000000 x24: 0000000000000000 
[14105.566689] x23: ffffff8040c28480 x22: 0000000000000001 
[14105.572001] x21: ffffffc0109b6000 x20: ffffff8040c28000 
[14105.577312] x19: 0000000000000000 x18: 0000000000000132 
[14105.582623] x17: 00004fb9fe4107dc x16: 00004a2c97c7a9b6 
[14105.587935] x15: ffffffc0109c7890 x14: 0000000000000396 
[14105.593246] x13: 0000000000000132 x12: ffffffc010b2ba48 
[14105.598557] x11: ffffffc010a1f890 x10: 00000000fffff000 
[14105.603869] x9 : ffffffc010a1f890 x8 : 0000000000000000 
[14105.609180] x7 : ffffffc0109c7890 x6 : 0000000000000001 
[14105.614491] x5 : 0000000000000000 x4 : 0000000000000000 
[14105.619802] x3 : ffffffc0109b8f8c x2 : 0000000000000004 
[14105.625113] x1 : 0000000000000005 x0 : 000000000000003c 
[14105.630425] Call trace:
[14105.632869]  dev_watchdog+0x2c8/0x2d0
[14105.636531]  call_timer_fn.constprop.0+0x24/0x80
[14105.641145]  __run_timers.part.0+0x240/0x280
[14105.645411]  run_timer_softirq+0x3c/0x74
[14105.649330]  _stext+0x124/0x290
[14105.652470]  irq_exit+0xa0/0xf0
[14105.655608]  __handle_domain_irq+0x80/0xe0
[14105.659705]  gic_handle_irq+0x78/0xa0
[14105.663363]  el1_irq+0xcc/0x180
[14105.666502]  arch_cpu_idle+0x18/0x30
[14105.670076]  default_idle_call+0x20/0x70
[14105.673996]  do_idle+0x21c/0x280
[14105.677221]  cpu_startup_entry+0x24/0x60
[14105.681140]  rest_init+0xbc/0xcc
[14105.684363]  arch_call_rest_init+0x10/0x1c
[14105.688455]  start_kernel+0x438/0x454
[14105.692112] ---[ end trace 45fcfc017fd545ae ]---

Again, this is running at stock clock/voltage.

johnp789 commented 2 years ago

I caught only one such error at 47 days of uptime, so troubleshooting may be a very slow process!

graysky2 commented 2 years ago

I caught only one such error at 47 days of uptime, so troubleshooting may be a very slow process!

I don't think so... you can trigger it using an iperf3 test pounding the RPi4:

For me it only took 30 gigs or so to trigger the bug:

# dmesg
...
[   11.787446] r8152 2-1:1.0 eth1: carrier on
[  520.956114] ------------[ cut here ]------------
[  520.960747] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[  520.967553] WARNING: CPU: 3 PID: 26 at net/sched/sch_generic.c:467 dev_watchdog+0x2c8/0x2d0
[  520.975898] Modules linked in: xt_connlimit pppoe ppp_async nf_conncount iptable_nat brcmfmac xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT wireguard pppox ppp_generic nf_nat nf_flow_table nf_conntrack libchacha20poly1305 libblake2s ipt_REJECT chacha_neon cfg80211 cdc_ncm cdc_ether ax88179_178a xt_time xt_tcpudp xt_tcpmss xt_statistic xt_recent xt_quota xt_pkttype xt_physdev xt_owner xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_cgroup xt_addrtype xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY usbnet usbhid slhc sch_cake rtl8150 r8152 poly1305_neon nf_reject_ipv4 nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 macvlan libcurve25519_generic libchacha libblake2s_generic iptable_raw iptable_mangle iptable_filter ipt_ECN ip_tables hid_generic crc_ccitt compat brcmutil br_netfilter sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit
[  520.976044]  act_mirred act_gact snd_bcm2835(C) hid evdev nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables ip6t_REJECT x_tables nf_reject_ipv6 ifb ip6_udp_tunnel udp_tunnel veth snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore nls_utf8 vfat fat nls_iso8859_1 nls_cp437
[  521.094218] CPU: 3 PID: 26 Comm: ksoftirqd/3 Tainted: G         C        5.10.93 #0
[  521.101866] Hardware name: Raspberry Pi 4 Model B Rev 1.1 (DT)
[  521.107692] pstate: 40400005 (nZcv daif +PAN -UAO -TCO BTYPE=--)
[  521.113691] pc : dev_watchdog+0x2c8/0x2d0
[  521.117692] lr : dev_watchdog+0x2c8/0x2d0
[  521.121691] sp : ffffffc010c2bc00
[  521.124996] x29: ffffffc010c2bc00 x28: 0000000000000140 
[  521.130301] x27: 00000000ffffffff x26: 0000000000000000 
[  521.135606] x25: 0000000000000003 x24: 0000000000000000 
[  521.140911] x23: ffffff8040c28480 x22: 0000000000000001 
[  521.146215] x21: ffffffc0109b6000 x20: ffffff8040c28000 
[  521.151519] x19: 0000000000000000 x18: 0000000000000126 
[  521.156824] x17: 0000c3aa115ea6c6 x16: 0000c3aa115ea6c6 
[  521.162128] x15: ffffffc0109c7890 x14: 0000000000000372 
[  521.167432] x13: 0000000000000126 x12: ffffffc010c2b8d8 
[  521.172737] x11: ffffffc010a1f890 x10: 00000000fffff000 
[  521.178041] x9 : ffffffc010a1f890 x8 : 0000000000000000 
[  521.183346] x7 : ffffffc0109c7890 x6 : 0000000000000001 
[  521.188650] x5 : 0000000000000000 x4 : 0000000000000000 
[  521.193954] x3 : 0000000000000000 x2 : ffffff807fbd4068 
[  521.199258] x1 : ffffffc06f22d000 x0 : 000000000000003c 
[  521.204564] Call trace:
[  521.207002]  dev_watchdog+0x2c8/0x2d0
[  521.210659]  call_timer_fn.constprop.0+0x24/0x80
[  521.215267]  __run_timers.part.0+0x240/0x280
[  521.219528]  run_timer_softirq+0x3c/0x74
[  521.223443]  _stext+0x124/0x290
[  521.226577]  run_ksoftirqd+0x54/0x70
[  521.230146]  smpboot_thread_fn+0x154/0x1a0
[  521.234235]  kthread+0x124/0x12c
[  521.237455]  ret_from_fork+0x10/0x34
[  521.241021] ---[ end trace 3907dc14c3eab63e ]---
graysky2 commented 2 years ago

@johnp789 @atsies - Can you try the iperf3 test and see if you get the same?

graysky2 commented 2 years ago

I confirmed this on a 2nd, newer RPi4B (hardware revision 1.4).

[  265.029040] ------------[ cut here ]------------
[  265.033684] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[  265.040451] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x2c8/0x2d0
[  265.048706] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet brcmfmac wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack libchacha20poly1305 libblake2s iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211 cdc_ncm cdc_ether ax88179_178a xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables usbnet usbhid slhc sch_cake rtl8150 r8152 poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_ipv4 nf_log_common nf_defrag_ipv6 nf_defrag_ipv4 macvlan libcurve25519_generic libcrc32c libchacha libblake2s_generic hid_generic crc_ccitt compat
[  265.048835]  brcmutil sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact snd_bcm2835(C) hid evdev ifb ip6_udp_tunnel udp_tunnel veth snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore nls_utf8 vfat fat nls_iso8859_1 nls_cp437
[  265.169349] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G         C        5.10.101 #0
[  265.176821] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  265.182646] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[  265.188643] pc : dev_watchdog+0x2c8/0x2d0
[  265.192643] lr : dev_watchdog+0x2c8/0x2d0
[  265.196642] sp : ffffffc010b33d70
[  265.199946] x29: ffffffc010b33d70 x28: 0000000000000140 
[  265.205250] x27: 00000000ffffffff x26: 0000000000000000 
[  265.210554] x25: 0000000000000001 x24: 0000000000000000 
[  265.215858] x23: ffffff8040c40480 x22: 0000000000000001 
[  265.221161] x21: ffffffc0109b6000 x20: ffffff8040c40000 
[  265.226464] x19: 0000000000000000 x18: 000000000000011f 
[  265.231767] x17: 0000000000000000 x16: 000000000000000c 
[  265.237070] x15: ffffffc0109c77d0 x14: 000000000000035d 
[  265.242376] x13: 000000000000011f x12: ffffffc010b33a48 
[  265.247687] x11: ffffffc010a1f7d0 x10: 00000000fffff000 
[  265.252999] x9 : ffffffc010a1f7d0 x8 : 0000000000000000 
[  265.258310] x7 : ffffffc0109c77d0 x6 : 0000000000000002 
[  265.263620] x5 : 0000000000000000 x4 : 0000000000000000 
[  265.268931] x3 : ffffffc0109b8f8c x2 : 0000000000000004 
[  265.274242] x1 : 0000000000000005 x0 : 000000000000003c 
[  265.279554] Call trace:
[  265.281998]  dev_watchdog+0x2c8/0x2d0
[  265.285662]  call_timer_fn.constprop.0+0x24/0x80
[  265.290276]  __run_timers.part.0+0x240/0x280
[  265.294543]  run_timer_softirq+0x3c/0x74
[  265.298462]  _stext+0x124/0x290
[  265.301602]  irq_exit+0xa0/0xf0
[  265.304742]  __handle_domain_irq+0x80/0xe0
[  265.308840]  gic_handle_irq+0x78/0xa0
[  265.312498]  el1_irq+0xcc/0x180
[  265.315637]  arch_cpu_idle+0x18/0x30
[  265.319210]  default_idle_call+0x20/0x70
[  265.323132]  do_idle+0x21c/0x280
[  265.326356]  cpu_startup_entry+0x28/0x60
[  265.330277]  secondary_start_kernel+0x128/0x150
[  265.334802] ---[ end trace 6a3a60879d0d2b7c ]---
johnp789 commented 2 years ago

I've done an update in the meantime.

$ uname -a
Linux pi4-e8d2e7ed 5.16.3-1-aarch64-ARCH #1 SMP Thu Jan 27 16:41:47 UTC 2022 aarch64 GNU/Linux

I ran iperf3 -n 100G -c $PI4_IP twice, back to back, and there were no NETDEV WATCHDOG errors logged.

wtschueller commented 2 years ago

Maybe related:

I can trigger the "NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue x timed out" using MTU sweep like this:

for ((MTU=2000; MTU<=2048; ((MTU=MTU+4)) )) do sudo ifconfig eth0 mtu ${MTU} ; ifconfig eth0 ; iperf3 -V -c 169.254.1.1 -t 10 -b 1000M; done

It always triggers at MTU=2036. Even if sweeping starts at 512.

It will not trigger if bandwith ist limited to 100MBit/s.

Kernel is 5.4.51-v7l+ with jumbo frames enabled.

graysky2 commented 2 years ago

@wtschueller - See the work-around @wulfy23 proposed here. Does it work for you?

@wulfy23 - Seems as though this work-around does not prevernt the problem on my machine. Using iperf3 from a PC on the LAN to the RPi4B router triggers it even having checksumming disabled.

[19493.462172] ------------[ cut here ]------------
[19493.466813] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[19493.473570] WARNING: CPU: 2 PID: 0 at net/sched/sch_generic.c:467 dev_watchdog+0x2c8/0x2d0
[19493.481825] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_ipv6 nf_flow_table_ipv4 nf_flow_table_inet brcmfmac wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_objref nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_counter nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack libchacha20poly1305 libblake2s iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211 cdc_ncm cdc_ether ax88179_178a xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables usbnet usbhid slhc sch_cake rtl8150 r8152 poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_ipv4 nf_log_common nf_defrag_ipv6 nf_defrag_ipv4 macvlan libcurve25519_generic libcrc32c libchacha libblake2s_generic hid_generic crc_ccitt compat
[19493.481949]  brcmutil sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_tcindex cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact snd_bcm2835(C) hid evdev ifb ip6_udp_tunnel udp_tunnel veth snd_rawmidi snd_seq_device snd_pcm_oss snd_mixer_oss snd_hwdep snd_compress snd_pcm snd_timer snd soundcore nls_utf8 vfat fat nls_iso8859_1 nls_cp437
[19493.602463] CPU: 2 PID: 0 Comm: swapper/2 Tainted: G         C        5.10.102 #0
[19493.609937] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[19493.615761] pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
[19493.621759] pc : dev_watchdog+0x2c8/0x2d0
[19493.625759] lr : dev_watchdog+0x2c8/0x2d0
[19493.629757] sp : ffffffc010b3bd70
[19493.633062] x29: ffffffc010b3bd70 x28: 0000000000000140 
[19493.638366] x27: 00000000ffffffff x26: 0000000000000000 
[19493.643669] x25: 0000000000000002 x24: 0000000000000000 
[19493.648973] x23: ffffff8040c40480 x22: 0000000000000001 
[19493.654276] x21: ffffffc0109b6000 x20: ffffff8040c40000 
[19493.659580] x19: 0000000000000000 x18: 000000000000011e 
[19493.664883] x17: 0000000000000000 x16: 000000000000001e 
[19493.670186] x15: ffffffc0109c77d0 x14: 000000000000035a 
[19493.675489] x13: 000000000000011e x12: ffffffc010b3ba48 
[19493.680793] x11: ffffffc010a1f7d0 x10: 00000000fffff000 
[19493.686096] x9 : ffffffc010a1f7d0 x8 : 0000000000000000 
[19493.691399] x7 : ffffffc0109c77d0 x6 : 0000000000000004 
[19493.696702] x5 : 0000000000000000 x4 : 0000000000000000 
[19493.702006] x3 : ffffffc0109b8f8c x2 : 0000000000000004 
[19493.707309] x1 : 0000000000000005 x0 : 000000000000003c 
[19493.712613] Call trace:
[19493.715051]  dev_watchdog+0x2c8/0x2d0
[19493.718705]  call_timer_fn.constprop.0+0x24/0x80
[19493.723313]  __run_timers.part.0+0x240/0x280
[19493.727573]  run_timer_softirq+0x3c/0x74
[19493.731486]  _stext+0x124/0x290
[19493.734619]  irq_exit+0xa0/0xf0
[19493.737756]  __handle_domain_irq+0x80/0xe0
[19493.741845]  gic_handle_irq+0x78/0xa0
[19493.745497]  el1_irq+0xcc/0x180
[19493.748632]  arch_cpu_idle+0x18/0x30
[19493.752199]  default_idle_call+0x20/0x70
[19493.756116]  do_idle+0x21c/0x280
[19493.759334]  cpu_startup_entry+0x28/0x60
[19493.763248]  secondary_start_kernel+0x128/0x150
[19493.767768] ---[ end trace 11aa64c6c6f35b63 ]---
wtschueller commented 2 years ago

@graysky2 I do not see checksum errors in dmesg. See below.

Here is what I have tried already:

kernel 5.10.17-v8 aarch64:

Basically same behaviour. Wireshark shows some transfers from RPi4 to PC, PC sends ack and that's it. PC repeats the ack several times an then give up. RPi4 does not even respond to arp requests anymore.

However, I need to ethtool -K eth0 tx off because hardware checksumming seems to corrupt outgoing jumbo frames. It must be switched on by default somewhere between 5.4 and 5.10, could be this commit.

Then, I switch off everything ethtool -k advertises (including rx crc). No change.

Reducing txqueuelen with ifconfig down to 1 lets iperf3 survive a few seconds at MTU=2036. Could be because throughput also gets reduced.

Disabling IPv6 in cmdline.txt, no change.

Changing number of txqueues:

bcmgenet_hw_params[] = {
...
[GENET_V5] = {
    .tx_queues = 0,
    .tx_bds_per_q = 0,

no change.

Benchmarking with udp traffic is less clear. Especially 8k packets seem to work for longer time. Less cpu load could be one reason.

kernel 5.16.10-v8 aarch64:

Done the sweep test only. Could not see dmesg updates because I compiled the kernel image only, no modules. Therefore, no wlan0 and eth0 got stuck. I would assume same behaviour.

wulfy23 commented 2 years ago

cheers... afaik rx checksumming off is effective for the 'tailroom' failure mode... (sparse connectionless rcu hang or whatever is going on there)

this iperf stress testing must be triggering another codepath or whatnot... but afaik... it looks like a better place to troubleshoot from...

could be looking at two bugs... but the gut says they are related...

wtschueller commented 2 years ago

I found an even faster way to trigger the transmit queue time out: UDP packet sweeping like this

for ((MTU=2024; MTU>8192; ((MTU=MTU-4)) )) do sudo ifconfig eth0 mtu ${MTU} ; ifconfig eth0 ; dd if=/dev/zero bs=65536 count=10 |netcat -u 169.254.1.1 65000 -q 1; done

Data is sent in chunks of 8k and fragmented according to MTU. I added logging to see what will go on.

[  +0,000391] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c000, masked=0x00010000, dma_ctrl=0x00020001
[  +0,000012] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1234 c_index=1235 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[210].dma_len=2106, dma_addr=0x0000000110b86002
[  +0,000145] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1235 c_index=1236 txbds_ready=1
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[211].dma_len=2106, dma_addr=0x0000000110b87002
[  +0,000140] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1236 c_index=1239 txbds_ready=3
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[212].dma_len=2106, dma_addr=0x0000000110b85002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[213].dma_len=2106, dma_addr=0x0000000110b84002
[  +0,000140] bcmgenet fd580000.ethernet: ring.cbs[214].dma_len= 266, dma_addr=0x0000000103a15802
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   3, free_bds=253->256
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1239 c_index=1239 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,188908] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +0,000099] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +0,835193] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c000, masked=0x00010000, dma_ctrl=0x00020001
[  +0,000022] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1239 c_index=1243 txbds_ready=4
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[215].dma_len=2114, dma_addr=0x0000000110b84002
[  +0,000013] bcmgenet fd580000.ethernet: ring.cbs[216].dma_len=2114, dma_addr=0x0000000110b85002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[217].dma_len=2114, dma_addr=0x0000000110b87002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[218].dma_len=2114, dma_addr=0x0000000110b86002
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   4, free_bds=251->255
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1243 c_index=1245 txbds_ready=2
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[219].dma_len= 234, dma_addr=0x0000000103a15002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[220].dma_len=2114, dma_addr=0x0000000112fb1002
[  +0,000008] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1245 c_index=1247 txbds_ready=2
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[221].dma_len=2114, dma_addr=0x000000011cee6002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[222].dma_len=2114, dma_addr=0x000000011cee4002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1247 c_index=1249 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[223].dma_len=2114, dma_addr=0x000000011cee5002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[224].dma_len= 234, dma_addr=0x0000000103a14802
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=254->256
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1249 c_index=1249 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,000008] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c000, masked=0x00010000, dma_ctrl=0x00020001
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1249 c_index=1251 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[225].dma_len=2114, dma_addr=0x000000011cee7002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[226].dma_len=2114, dma_addr=0x000000011cee5002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1251 c_index=1253 txbds_ready=2
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[227].dma_len=2114, dma_addr=0x000000011cee4002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[228].dma_len=2114, dma_addr=0x000000011cee6002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1253 c_index=1254 txbds_ready=1
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[229].dma_len= 234, dma_addr=0x0000000103a14402
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1254 c_index=1255 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[230].dma_len=2114, dma_addr=0x000000011cee6002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1255 c_index=1257 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[231].dma_len=2114, dma_addr=0x000000011cee4002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[232].dma_len=2114, dma_addr=0x000000011cee5002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000024] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1257 c_index=1259 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[233].dma_len=2114, dma_addr=0x000000011cee7002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[234].dma_len= 234, dma_addr=0x0000000103a14802
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=254->256
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1259 c_index=1260 txbds_ready=1
[  +0,000004] bcmgenet fd580000.ethernet: ring.cbs[235].dma_len=2114, dma_addr=0x000000011cee5002
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1260 c_index=1262 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[236].dma_len=2114, dma_addr=0x000000011cee4002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[237].dma_len=2114, dma_addr=0x000000011cee6002
[  +0,000005] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1262 c_index=1263 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[238].dma_len=2114, dma_addr=0x00000001014fe002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1263 c_index=1264 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[239].dma_len= 234, dma_addr=0x0000000103a15002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1264 c_index=1265 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[240].dma_len=2114, dma_addr=0x00000001014fe002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1265 c_index=1266 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[241].dma_len=2114, dma_addr=0x000000011dd78002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1266 c_index=1268 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[242].dma_len=2114, dma_addr=0x000000011dd7b002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[243].dma_len=2114, dma_addr=0x000000011dd79002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1268 c_index=1269 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[244].dma_len= 234, dma_addr=0x0000000103a14c02
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1269 c_index=1270 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[245].dma_len=2114, dma_addr=0x000000011dd7a002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1270 c_index=1272 txbds_ready=2
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[246].dma_len=2114, dma_addr=0x000000011dd79002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[247].dma_len=2114, dma_addr=0x000000011dd7b002
[  +0,000005] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=253->255
[  +0,000009] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1272 c_index=1273 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[248].dma_len=2114, dma_addr=0x000000011dd78002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=254->255
[  +0,000008] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1273 c_index=1274 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[249].dma_len= 234, dma_addr=0x0000000103a15402
[  +0,000036] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1274 c_index=1275 txbds_ready=1
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[250].dma_len=2114, dma_addr=0x000000011dd78002
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000401] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1275 c_index=1278 txbds_ready=3
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[251].dma_len=2114, dma_addr=0x000000011dd7b002
[  +0,000009] bcmgenet fd580000.ethernet: ring.cbs[252].dma_len=2114, dma_addr=0x000000011dd79002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[253].dma_len=2114, dma_addr=0x000000011dd7a002
[  +0,000008] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   3, free_bds=252->255
[  +0,000020] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1278 c_index=1281 txbds_ready=3
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[254].dma_len= 234, dma_addr=0x0000000103a15802
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[255].dma_len=2114, dma_addr=0x000000011dd78002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[  0].dma_len=2114, dma_addr=0x0000000108a96002
[  +0,000009] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   3, free_bds=251->254
[  +0,000165] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1281 c_index=1284 txbds_ready=3
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[  1].dma_len=2114, dma_addr=0x000000010a4ea002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[  2].dma_len=2114, dma_addr=0x000000010827a002
[  +0,000152] bcmgenet fd580000.ethernet: ring.cbs[  3].dma_len= 234, dma_addr=0x0000000103a15c02
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   3, free_bds=253->256
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1284 c_index=1284 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,000751] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c000, masked=0x00010000, dma_ctrl=0x00020001
[  +0,000015] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1284 c_index=1290 txbds_ready=6
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[  4].dma_len=2114, dma_addr=0x000000010827a002
[  +0,000155] bcmgenet fd580000.ethernet: ring.cbs[  5].dma_len=2114, dma_addr=0x000000011de91002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[  6].dma_len=2114, dma_addr=0x000000011de93002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[  7].dma_len=2114, dma_addr=0x000000011de92002
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[  8].dma_len= 234, dma_addr=0x0000000103a16402
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[  9].dma_len=2114, dma_addr=0x000000011de90002
[  +0,000152] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   6, free_bds=236->242
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=1290 c_index=1290 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=241->241
[  +0,186033] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +0,000028] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +1,023983] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +0,000050] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c000, masked=0x00800000, dma_ctrl=0x00020001
[  +0,895920] ------------[ cut here ]------------
[  +0,000032] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[  +0,000731] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a0/0x3a8
[  +0,000709] Modules linked in: rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc hid_logitech_hidpp joydev brcmfmac brcmutil hid_logitech_dj cfg80211 rfkill raspberrypi_hwmon v3d gpu_sched vc4 cec i2c_brcmstb drm_kms_helper bcm2835_codec(C) snd_bcm2835(C) v4l2_mem2mem bcm2835_v4l2(C) bcm2835_isp(C) drm bcm2835_mmal_vchiq(C) videobuf2_dma_contig videobuf2_vmalloc vc_sm_cma(C) videobuf2_memops drm_panel_orientation_quirks videobuf2_v4l2 videobuf2_common snd_soc_core snd_compress snd_pcm_dmaengine videodev mc snd_pcm snd_timer snd syscopyarea rpivid_mem sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq nvmem_rmem uio i2c_dev aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[  +0,001137] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G         C        5.10.17-v8+jf+ #48
[  +0,000005] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  +0,000008] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[  +0,000008] pc : dev_watchdog+0x3a0/0x3a8
[  +0,000007] lr : dev_watchdog+0x3a0/0x3a8
[  +0,000004] sp : ffffffc011553d10

I can see it sending fragments of size 2050+64=2114 (and the smaller last fragment) for a while and then it gets stuck. No more UMAC_IRQ_TXDMA_MBDONEinterrupts, no more outgoing packets. Wireshark does not receive the very last packet (ring.cbs[ 9].dma_len=2114) that is marked as sent.

I can also do the MTU sweep down:

for ((MTU=8192; MTU>2000; ((MTU=MTU-4)) )) do sudo ifconfig eth0 mtu ${MTU} ; ifconfig eth0 ; dd if=/dev/zero bs=65536 count=10 |netcat -u 169.254.1.1 65000 -q 1; done
[  +0,000492] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c100, masked=0x00010000
[  +0,000377] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17511 c_index=17517 txbds_ready=6
[  +0,000186] bcmgenet fd580000.ethernet: ring.cbs[103].dma_len=6290, dma_addr=0x0112f42002
[  +0,000015] bcmgenet fd580000.ethernet: ring.cbs[104].dma_len=2106, dma_addr=0x011e252002
[  +0,000009] bcmgenet fd580000.ethernet: ring.cbs[105].dma_len=6290, dma_addr=0x0112f46002
[  +0,000009] bcmgenet fd580000.ethernet: ring.cbs[106].dma_len=2106, dma_addr=0x011e256002
[  +0,000008] bcmgenet fd580000.ethernet: ring.cbs[107].dma_len=6290, dma_addr=0x0112f44002
[  +0,000009] bcmgenet fd580000.ethernet: ring.cbs[108].dma_len=2106, dma_addr=0x011e255002
[  +0,000008] bcmgenet fd580000.ethernet: ring.cbs[109].dma_len=6290, dma_addr=0x0000000000 (next tx)
[  +0,000009] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   6, free_bds=250->256
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17517 c_index=17517 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[109].dma_len=6290, dma_addr=0x0000000000 (next tx)
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,144407] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c100, masked=0x00010000
[  +0,000033] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17517 c_index=17519 txbds_ready=2
[  +0,000009] bcmgenet fd580000.ethernet: ring.cbs[109].dma_len= 118, dma_addr=0x012654b4ca
[  +0,000013] bcmgenet fd580000.ethernet: ring.cbs[110].dma_len=  72, dma_addr=0x0112e18000
[  +0,000010] bcmgenet fd580000.ethernet: ring.cbs[111].dma_len=6290, dma_addr=0x0000000000 (next tx)
[  +0,000009] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=254->256
[  +0,000026] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17519 c_index=17519 txbds_ready=0
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[111].dma_len=6290, dma_addr=0x0000000000 (next tx)
[  +0,000278] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,000504] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0006e100, masked=0x00002000
[  +0,000130] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007e100, masked=0x00010000
[  +0,000010] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0006e100, masked=0x00002000
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17519 c_index=17520 txbds_ready=1
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[111].dma_len= 118, dma_addr=0x01212950ca
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[112].dma_len=2106, dma_addr=0x0000000000 (next tx)
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   1, free_bds=255->256
[  +0,000010] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17520 c_index=17520 txbds_ready=0
[  +0,000005] bcmgenet fd580000.ethernet: ring.cbs[112].dma_len=2106, dma_addr=0x0000000000 (next tx)
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=256->256
[  +0,734805] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +0,000027] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +0,142744] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0007c100, masked=0x00010000
[  +0,000020] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17520 c_index=17522 txbds_ready=2
[  +0,000007] bcmgenet fd580000.ethernet: ring.cbs[112].dma_len=6282, dma_addr=0x0126420002
[  +0,000011] bcmgenet fd580000.ethernet: ring.cbs[113].dma_len=2114, dma_addr=0x0124d59002
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[114].dma_len=6282, dma_addr=0x0126424002 (next tx)
[  +0,000007] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   2, free_bds=252->254
[  +0,000011] bcmgenet fd580000.ethernet: __bcmgenet_tx_reclaim ring=16 old_c_index=17522 c_index=17522 txbds_ready=0
[  +0,000006] bcmgenet fd580000.ethernet: ring.cbs[114].dma_len=6282, dma_addr=0x0126424002 (next tx)
[  +0,000006] bcmgenet fd580000.ethernet: bcmgenet_tx_poll work_done=   0, free_bds=254->254
[  +0,881175] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +0,000055] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +1,023958] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +0,000065] bcmgenet fd580000.ethernet: bcmgenet_isr0: full_status=0x0086c100, masked=0x00800000
[  +0,767930] ------------[ cut here ]------------
[  +0,000047] NETDEV WATCHDOG: eth0 (bcmgenet): transmit queue 0 timed out
[  +0,000815] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:443 dev_watchdog+0x3a0/0x3a8
[  +0,000008] Modules linked in: md4 md5 nls_utf8 cifs libarc4 rfcomm bnep hci_uart btbcm bluetooth ecdh_generic ecc fuse 8021q garp stp llc hid_logitech_hidpp joydev hid_logitech_dj vc4 brcmfmac bcm2835_codec(C) v3d cec gpu_sched brcmutil raspberrypi_hwmon drm_kms_helper bcm2835_isp(C) cfg80211 v4l2_mem2mem bcm2835_v4l2(C) videobuf2_dma_contig drm bcm2835_mmal_vchiq(C) videobuf2_vmalloc drm_panel_orientation_quirks videobuf2_memops videobuf2_v4l2 rfkill i2c_brcmstb videobuf2_common snd_soc_core vc_sm_cma(C) videodev snd_compress mc snd_bcm2835(C) snd_pcm_dmaengine snd_pcm snd_timer snd rpivid_mem syscopyarea sysfillrect sysimgblt fb_sys_fops backlight uio_pdrv_genirq uio nvmem_rmem i2c_dev aes_neon_bs sha256_generic aes_neon_blk crypto_simd cryptd ip_tables x_tables ipv6
[  +0,001256] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G         C        5.10.17-v8+jf+ #55
[  +0,000007] Hardware name: Raspberry Pi 4 Model B Rev 1.4 (DT)
[  +0,000011] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO BTYPE=--)
[  +0,000012] pc : dev_watchdog+0x3a0/0x3a8
[  +0,000020] lr : dev_watchdog+0x3a0/0x3a8
[  +0,000012] sp : ffffffc011563d10
[  +0,000011] x29: ffffffc011563d10 x28: ffffff8101413580 

Surprisingly, the time out occurs right at the very first packet of size 2050+64=2114 (ring.cbs[113].dma_len=2114). This time it was received by wireshark.

Reverting parts of net: bcmgenet: Implement TX coalescing control knobs and bcmgenet: Better coalescing parameter defaults does not help (nor changing the coalesce setting using ethtool.

wtschueller commented 2 years ago

No real progress in the mean time. A can trigger the time out only by sending packets of size 2049 ... 2093 bytes.

No single time out if MTU is set to default 1500. So I think the original topic must be something else.

Asked the forum for help Increasing MTU on Pi 4 again.

wtschueller commented 2 years ago

@pelwell - Could you have a look please? I saw your commit phy: broadcom: Add bcm54213pe configuration. There is a function bcm_phy_enable_jumbo(), so I guess there could be additional settings for stable use of jumbo frames on RPi4.

pelwell commented 2 years ago

Adding a call to existing function for a different but compatible PHY requires very little knowledge (provided it works, which it did).

wtschueller commented 2 years ago

@pelwell Thank you. Ok, I blindly assumed that you have access to the documentation. Maybe you could point me in the right direction.

I hope that if we know what goes wrong than we can develop a better time out handler that brings eth0 back to life. As a side effect, I get jumbo frames stable.

wtschueller commented 2 years ago

Still no real progress. 06d836801cd82ded282aaf9e888ff9e7e4a88b91 does not help in my case.

segabor commented 2 years ago

Hi there, My RPi 4 with the latest kernel 5.15.0-1013-raspi became totally unusable. Logs are full of bcmgenet_txmit: tx ring 0 full when queue 1 awake I have to turn the network switch off and on to make it work for a couple of minutes. I'm going to revert to an older kernel hoping it brings back stability.

I assume the bug came out when I upgraded to Ubuntu Jammy. Since then I keep getting tx ring full errors.

mjasny commented 2 years ago

Try to disable the eee setting on eth0 (“Energy Efficient Ethernet”). After I disabled I'm not experiencing these tx ring errors.

sudo ethtool --set-eee eth0 eee off

sudo ethtool --show-eee eth0

segabor commented 2 years ago

Thanks @mjasny, I'll give it a try

zenlord commented 1 year ago

There have been reports that the workaround using ethtool is reverted automatically and thus disables the workaround.

The devs have created a new config.txt parameter to switch off EEE in the driver. More info: https://github.com/raspberrypi/linux/issues/4289#issuecomment-1351697589