raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.17k stars 5.01k forks source link

Wifi (brcmfmac) regression between linux-firmware 20201218 and 20210208 on Raspberry Pi 4 #4145

Open ruuda opened 3 years ago

ruuda commented 3 years ago

Is this the right place for my bug report?

The problem manifests in the linux-firmware package. I first filed a bug upstream and on the kernel mailing list. There I was told to instead open an issue here.

Describe the bug

After upgrading the linux-firmware package (https://archlinuxarm.org/packages/armv8/linux-firmware) from 20201218.646f159 to 20210208.b79d239, my wlan0 interface no longer showed up in networkctl, and dmesg shows errors including

brcmf_sdio_htclk: HT Avail timeout (1000000): clkctl 0x50

After downgrading to 20201218, everything worked fine again. This is on a Raspberry Pi model 4. I can reproduce both with kernel 5.4.83 and with kernel 5.10.14 from the Arch Linux ARM package (https://archlinuxarm.org/packages/aarch64/linux-raspberrypi4).

To reproduce

For me, just installing linux-firmware 20210208 and rebooting is enough to reproduce, but it might be there is something in my particular configuration that triggers this, let me know if there is anything I can do to help diagnose this.

Expected behaviour

After boot, networkctl list lists lo, eth0, and wlan0.

Actual behaviour

After boot, networkctl list lists lo, eth0, but not wlan0.

System

Logs

Kernel log excerpt for the broken version:

$ sudo journalctl --no-hostname -t kernel --boot -1 | grep brcmf --context 10 | grep -v 'audit: type'
Feb 03 22:51:28 kernel: videodev: Linux video capture interface: v2.00
Feb 03 22:51:28 kernel: vc4-drm gpu: bound fec12000.pixelvalve (ops vc4_crtc_ops [vc4])
Feb 03 22:51:28 kernel: [drm] Initialized vc4 0.0.0 20140616 for gpu on minor 1
Feb 03 22:51:28 kernel: cfg80211: Loading compiled-in X.509 certificates for regulatory database
Feb 03 22:51:28 kernel: cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
Feb 03 22:51:28 kernel: bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Feb 03 22:51:28 kernel: bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: cfg80211: failed to load regulatory.db
Feb 03 22:51:28 kernel: bcm2835_mmal_vchiq: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: brcmfmac: F1 signature read @0x18000000=0x15264345
Feb 03 22:51:28 kernel: bcm2835_isp: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: bcm2835_v4l2: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: bcm2835_codec: module is from the staging directory, the quality is unknown, you have been warned.
Feb 03 22:51:28 kernel: brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
Feb 03 22:51:28 kernel: usbcore: registered new interface driver brcmfmac
Feb 03 22:51:28 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video10
Feb 03 22:51:28 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 decode
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Device node output[0] registered as /dev/video13
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Device node capture[0] registered as /dev/video14
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Device node capture[1] registered as /dev/video15
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video16
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register output node 0 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 1 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 2 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 3 with media controller
--
Feb 03 22:51:28 kernel: Bluetooth: SCO socket layer initialized
Feb 03 22:51:28 kernel: vc4-drm gpu: [drm] fb0: vc4drmfb frame buffer device
Feb 03 22:51:28 kernel: FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Feb 13 12:51:41 kernel: brcmfmac: brcmf_sdio_htclk: HT Avail timeout (1000000): clkctl 0x50
Feb 13 12:51:42 kernel: random: crng init done
Feb 13 12:51:42 kernel: random: 7 urandom warning(s) missed due to ratelimiting
Feb 13 12:51:42 kernel: brcmfmac: brcmf_sdio_htclk: HT Avail timeout (1000000): clkctl 0x50
Feb 13 12:53:08 kernel: kauditd_printk_skb: 4 callbacks suppressed

With the good version, there is no timeout in brcmf:

sudo journalctl --no-hostname -t kernel --boot -0 | grep brcmf --context 10 | grep -v 'audit: type'
Feb 03 22:51:28 kernel: bcm2835-codec bcm2835-codec: Device registered as /dev/video12
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Device node stats[2] registered as /dev/video16
Feb 03 22:51:28 kernel: bcm2835-codec bcm2835-codec: Loaded V4L2 isp
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register output node 0 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 1 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 2 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Register capture node 3 with media controller
Feb 03 22:51:28 kernel: bcm2835-isp bcm2835-isp: Loaded V4L2 bcm2835-isp
Feb 03 22:51:28 kernel: platform regulatory.0: Direct firmware load for regulatory.db failed with error -2
Feb 03 22:51:28 kernel: cfg80211: failed to load regulatory.db
Feb 03 22:51:28 kernel: brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
Feb 03 22:51:28 kernel: usbcore: registered new interface driver brcmfmac
Feb 03 22:51:28 kernel: Bluetooth: Core ver 2.22
Feb 03 22:51:28 kernel: NET: Registered protocol family 31
Feb 03 22:51:28 kernel: Bluetooth: HCI device and connection manager initialized
Feb 03 22:51:28 kernel: Bluetooth: HCI socket layer initialized
Feb 03 22:51:28 kernel: Bluetooth: L2CAP socket layer initialized
Feb 03 22:51:28 kernel: Bluetooth: SCO socket layer initialized
Feb 03 22:51:28 kernel: FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.
Feb 03 22:51:28 kernel: brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43455-sdio for chip BCM4345/6
Feb 03 22:51:28 kernel: brcmfmac: brcmf_c_preinit_dcmds: Firmware: BCM4345/6 wl0: Feb 27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
Feb 03 22:51:28 kernel: brcmfmac: brcmf_cfg80211_set_power_mgmt: power save enabled
Feb 03 22:51:28 kernel: proc: Bad value for 'hidepid'
Feb 03 22:51:28 kernel: proc: Bad value for 'hidepid'
Feb 13 13:16:25 kernel: proc: Bad value for 'hidepid'
Feb 13 13:16:25 kernel: .
Feb 13 13:16:25 kernel: random: crng init done
cbxbiker61 commented 3 years ago

If you remove brcmfmac43455-sdio.raspberrypi,4-model-b.txt (probably also brcmfmac43455-sdio.raspberrypi,3-model-b-plus.txt for good measure). It will probably work. I had to sort this out on my systems just this last week. It seems to be an incompatibility between the .txt files and the .bin file. You'll see an error in the system log when it tries to load the .txt files but it doesn't seem to cause any real issues. Performance seems to be the same.

cpegel commented 3 years ago

Raspberry Pi Zero seems to be affected by this as well. Wifi doesn't come up with ArchLinuxARM using linux-firmware 20210208.b79d239-1 on a Raspberry Pi Zero. Downgrading fixes the issue.

JeanMax commented 3 years ago

Same problem here on a Pi Zero W. The workaround from @cbxbiker61 didn't work.

ruuda commented 3 years ago

A comment on the kernel Bugzilla bug I filed suggests that https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/brcm?id=870b805dc77e3f7acb7e6c36944936e310be2478, a change in brcmfmac43455-sdio.raspberrypi,4-model-b.txt is to blame. That might explain why deleting the file helps. (I haven’t yet tried that myself.) The commenter suggests changing the boardflags3 line back to boardflags3=0x44200100.

pelwell commented 3 years ago

That upstream change to board_flags3 is wrong. As I understand it, bits 26 and 27 tell the 43455 whether to use an internal or external low-power oscillator (LPO). Bit 26 being set (0x44...) indicates that the internal LPO should be use, while bit 27 (0x48...) means there is an external LPO. Pi 4 (4B, 400 and CM4) does not have an external LPO - it saves a GPIO - so it needs the 0x44 version. Although from what I can see all versions of 3B+ do have an LPO clock (supplied from GPIO43), the downstream common nvram file isn't enabling it - with no apparent side effects.

umgfoin commented 3 years ago

Hello all, on PiZeroW I'm getting ieee80211 phy0: brcmf_fw_crashed: Firmware has halted or crashed:

[   32.884942] ieee80211 phy0: brcmf_fw_crashed: Firmware has halted or crashed
[   32.893713] ------------[ cut here ]------------
[   32.899823] WARNING: CPU: 0 PID: 7 at kernel/workqueue.c:1477 __queue_work+0x4ac/0x628
[   32.910608] Modules linked in: brcmfmac brcmutil cfg80211 bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) raspberrypi_hwmon bcm2835_mmal_vchiq(C) i2c_bcm2835 v4
                                                                                                                                                                   [   32.984486] Workqueue: brcmf_wq/mmc1:0001:1 brcmf_sdio_dataworker [brcmfmac]
[   32.993102] [<8010ef8c>] (unwind_backtrace) from [<8010ae04>] (show_stack+0x10/0x14)
[   33.003759] [<8010ae04>] (show_stack) from [<80d03c88>] (dump_stack+0xd0/0xf8)
[   33.012556] [<80d03c88>] (dump_stack) from [<8012bd1c>] (__warn+0xd4/0x140)
[   33.021048] [<8012bd1c>] (__warn) from [<80cfb694>] (warn_slowpath_fmt+0x74/0xa8)
[   33.031464] [<80cfb694>] (warn_slowpath_fmt) from [<80147154>] (__queue_work+0x4ac/0x628)
[   33.042596] [<80147154>] (__queue_work) from [<80147328>] (queue_work_on+0x58/0x60)
[   33.053377] [<80147328>] (queue_work_on) from [<7f2e21b0>] (brcmf_sdio_dataworker+0x630/0x2148 [brcmfmac])
[   33.066262] [<7f2e21b0>] (brcmf_sdio_dataworker [brcmfmac]) from [<80147a70>] (process_one_work+0x204/0x53c)
[   33.079128] [<80147a70>] (process_one_work) from [<80147df0>] (worker_thread+0x48/0x594)
[   33.090183] [<80147df0>] (worker_thread) from [<8014dca8>] (kthread+0x130/0x144)
[   33.100583] [<8014dca8>] (kthread) from [<80100138>] (ret_from_fork+0x14/0x3c)
[   33.109391] Exception stack(0x818e3fb0 to 0x818e3ff8)
[   33.116005] 3fa0:                                     00000000 00000000 00000000 00000000
[   33.127222] 3fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   33.138419] 3fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   33.146616] ---[ end trace b9e78806dda15565 ]---

bg umgfoin

pelwell commented 3 years ago

If you are hoping to get some help with a kernel problem, don't just append a random stack trace to any other open issue that is about roughly the same subject. Notice that nobody else has mentioned that the firmware crashes.

Without some version numbers for context I'm just going to assume that you are running old firmware and the problem has already been fixed.

mdevaev commented 3 years ago

@pelwell I can confirm that. The linux-firmware update that caused the described problem on Pi4 causes the driver to crash on ZeroW with the same kernel version and other things. Downgrade of the linux-firmware package fixes this on both boards.

gearhead commented 3 years ago

On a PiB3, I get the same thing after a system upgrade this AM. no wifi interface shows up. Linux version 5.10.14-3-ARCH

[   13.361754] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   13.466210] ieee80211 phy0: brcmf_fw_crashed: Firmware has halted or crashed
[   13.474051] ------------[ cut here ]------------
[   13.479366] WARNING: CPU: 2 PID: 7 at kernel/workqueue.c:1477 __queue_work+0x4ac/0x628
[   13.487977] Modules linked in: brcmfmac brcmutil vc4 cfg80211 cec drm_kms_helper snd_soc_core snd_compress snd_pcm_dmaengine syscopyarea sysfillrect sysimgblt fb_sys_fops bcm2835_isp(C) bcm2835_codec(C) bcm2835_v4l2(C) raspberrypi_hwmon bcm2835_mmal_vchiq(C) videobuf2_vmalloc v4l2_mem2mem i2c_bcm2835 videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc bcm2835_rng rng_core uio_pdrv_genirq fixed uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd fuse drm drm_panel_orientation_quirks ip_tables x_tables ipv6
[   13.540523] CPU: 2 PID: 7 Comm: kworker/u8:0 Tainted: G         C        5.10.14-3-ARCH #1
[   13.550183] Hardware name: BCM2835
[   13.554319] Workqueue: brcmf_wq/mmc1:0001:1 brcmf_sdio_dataworker [brcmfmac]
[   13.562084] [<8010ef8c>] (unwind_backtrace) from [<8010ae04>] (show_stack+0x10/0x14)
[   13.570553] [<8010ae04>] (show_stack) from [<80d03c88>] (dump_stack+0xd0/0xf8)
[   13.578508] [<80d03c88>] (dump_stack) from [<8012bd1c>] (__warn+0xd4/0x140)
[   13.586207] [<8012bd1c>] (__warn) from [<80cfb694>] (warn_slowpath_fmt+0x74/0xa8)
[   13.594436] [<80cfb694>] (warn_slowpath_fmt) from [<80147154>] (__queue_work+0x4ac/0x628)
[   13.604114] [<80147154>] (__queue_work) from [<80147328>] (queue_work_on+0x58/0x60)
[   13.612604] [<80147328>] (queue_work_on) from [<7f5941b0>] (brcmf_sdio_dataworker+0x630/0x2148 [brcmfmac])
[   13.623897] [<7f5941b0>] (brcmf_sdio_dataworker [brcmfmac]) from [<80147a70>] (process_one_work+0x204/0x53c)
[   13.635373] [<80147a70>] (process_one_work) from [<80147df0>] (worker_thread+0x48/0x594)
[   13.645181] [<80147df0>] (worker_thread) from [<8014dca8>] (kthread+0x130/0x144)
[   13.653457] [<8014dca8>] (kthread) from [<80100138>] (ret_from_fork+0x14/0x3c)
[   13.661544] Exception stack(0x818f5fb0 to 0x818f5ff8)
[   13.667449] 5fa0:                                     00000000 00000000 00000000 00000000
[   13.677328] 5fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   13.687208] 5fe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   13.694679] ---[ end trace 00752a952f5327fb ]---
[   15.039260] smsc95xx 1-1.1:1.0 eth0: Link is Up - 100Mbps/Full - flow control off
[   15.046879] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[   15.918359] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[   15.924288] ieee80211 phy0: brcmf_c_process_clm_blob: clmload (4733 byte file) failed (-110)
[   18.478354] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[   18.484279] ieee80211 phy0: brcmf_c_process_clm_blob: get clmload_status failed (-110)
[   18.492331] ieee80211 phy0: brcmf_c_preinit_dcmds: download CLM blob file failed, -5
[   18.500177] ieee80211 phy0: brcmf_bus_started: failed: -5
[   18.505655] ieee80211 phy0: brcmf_attach: dongle is not responding: err=-5
[   18.581688] brcmfmac: brcmf_sdio_firmware_callback: brcmf_attach failed

it shows brcmfmac module loaded, but the port does not show up. I compared the brcmfmac43430-sdio.raspberrypi,3-model-b.txt file on this to an older image from November running on a Zero and they appear identical. I was never clear which files were being used as there are similar files in /usr/lib/firmware/brcm and /usr/lib/firmware/updates/brcm

gearhead commented 3 years ago

I tacked this on the Pi4 regression, but it seems to be a brcmfmac regresson as it also does not work on my Pi3b+ nor my Pizero with the 32 bit OS. This is what shows on the Zero

[   23.148986] brcmfmac mmc1:0001:1: Direct firmware load for brcm/brcmfmac43430-sdio.raspberrypi,model-zero-w.txt failed with error -2
[   23.295401] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac43430-sdio for chip BCM43430/1
[   23.373330] ieee80211 phy0: brcmf_fw_crashed: Firmware has halted or crashed
[   23.373357] ------------[ cut here ]------------
[   23.373388] WARNING: CPU: 0 PID: 129 at kernel/workqueue.c:1477 __queue_work+0x4ac/0x628
[   23.373396] Modules linked in: brcmfmac brcmutil cfg80211 r8152 hci_uart btbcm bluetooth raspberrypi_hwmon ecdh_generic ecc bcm2835_codec(C) bcm2835_v4l2(C) bcm2835_isp(C) v4l2_mem2mem bcm2835_mmal_vchiq(C) videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common videodev bcm2835_rng mc rng_core uio_pdrv_genirq fixed uio sch_fq_codel snd_bcm2835(C) snd_pcm snd_timer snd crypto_user fuse ip_tables x_tables ipv6
[   23.373577] CPU: 0 PID: 129 Comm: kworker/u2:2 Tainted: G         C        5.10.14-1-ARCH #1
[   23.373584] Hardware name: BCM2835
[   23.373861] Workqueue: brcmf_wq/mmc1:0001:1 brcmf_sdio_dataworker [brcmfmac]
[   23.373910] [<8010ef8c>] (unwind_backtrace) from [<8010ae04>] (show_stack+0x10/0x14)
[   23.373933] [<8010ae04>] (show_stack) from [<80d04e08>] (dump_stack+0xd0/0xf8)
[   23.373969] [<80d04e08>] (dump_stack) from [<8012bd1c>] (__warn+0xd4/0x140)
[   23.373997] [<8012bd1c>] (__warn) from [<80cfc814>] (warn_slowpath_fmt+0x74/0xa8)
[   23.374018] [<80cfc814>] (warn_slowpath_fmt) from [<80147154>] (__queue_work+0x4ac/0x628)
[   23.374035] [<80147154>] (__queue_work) from [<80147328>] (queue_work_on+0x58/0x60)
[   23.374206] [<80147328>] (queue_work_on) from [<7f34a1b0>] (brcmf_sdio_dataworker+0x630/0x2148 [brcmfmac])
[   23.374396] [<7f34a1b0>] (brcmf_sdio_dataworker [brcmfmac]) from [<80147a70>] (process_one_work+0x204/0x53c)
[   23.374417] [<80147a70>] (process_one_work) from [<80147df0>] (worker_thread+0x48/0x594)
[   23.374441] [<80147df0>] (worker_thread) from [<8014dca8>] (kthread+0x130/0x144)
[   23.374464] [<8014dca8>] (kthread) from [<80100138>] (ret_from_fork+0x14/0x3c)
[   23.374476] Exception stack(0x8408ffb0 to 0x8408fff8)
[   23.374487] ffa0:                                     00000000 00000000 00000000 00000000
[   23.374501] ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[   23.374513] ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
[   23.374524] ---[ end trace 9cf76952016cb4e2 ]---
[   25.851860] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[   25.851914] ieee80211 phy0: brcmf_c_process_clm_blob: clmload (4733 byte file) failed (-110)

does this belong here or should it be in a new issue?

graysky2 commented 3 years ago

@pelwell

That upstream change to board_flags3 is wrong...

Thank you for sharing this observation. The firmware-raspberrypi package has been modified to provide that change to /usr/lib/firmware/updates/brcm/brcmfmac43455-sdio.raspberrypi,4-model-b.txt which fixes the loss of WiFi on the 4B.

You can probably close this ticket since it relates to this specific breakage. Breakage to WiFI on the Pi Zero should be opened as a separate issue as you stated.

lategoodbye commented 3 years ago

Thanks for analyzing this issue. A patch has been send against linux-firmware: https://lore.kernel.org/linux-firmware/20210216193523.32321-1-matthias.bgg@kernel.org/T/#u

graysky2 commented 3 years ago

Thanks @lategoodbye. Are you able to help with that Zero-W breakage? A work-around is to supply an older version of brcmfmac43430-sdio.txt, see: https://github.com/archlinuxarm/PKGBUILDs/commit/d7b06e52b96bc0e598c51d9a0695a2f7b40e2abc