raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
10.92k stars 4.91k forks source link

wlan freezes in raspberry pi 3B+ #2453

Open ghost opened 6 years ago

ghost commented 6 years ago

(see also https://github.com/raspberrypi/linux/issues/1342 )

I've also got that problem with wifi dying.

Mar 17 18:25:28 hassass kernel: [10279.186321] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012
Mar 17 18:25:30 hassass kernel: [10281.665090] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Mar 17 18:25:30 hassass kernel: [10281.665622] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Mar 17 18:25:30 hassass kernel: [10281.665638] brcmfmac: brcmf_run_escan: error (-110)
Mar 17 18:25:30 hassass kernel: [10281.665647] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Mar 17 18:26:30 hassass kernel: [10341.665866] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout

This is with 4.14.27-v7+ and with /sbin/iw dev wlan0 set power_save off /sbin/ifconfig wlan0 promisc in /etc/rc.local.

[    4.112717] brcmfmac: F1 signature read @0x18000000=0x15264345
[    4.119827] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[    4.120314] usbcore: registered new interface driver brcmfmac
[    4.440371] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[    4.440958] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[   10.911757] brcmfmac: power management disabled
[   12.016088] brcmfmac: power management disabled
[ 2074.090674] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 2074.090687] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 2074.090745] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.090753] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 2074.610583] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.611992] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.613945] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.613971] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 2074.729716] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.729733] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 2074.871693] usbcore: deregistering interface driver brcmfmac
[ 2074.929084] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 2074.936897] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 2074.937139] usbcore: registered new interface driver brcmfmac
[ 2075.118180] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 2075.118706] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 2075.215365] brcmfmac: power management disabled
[ 2075.263751] brcmfmac: power management disabled
[ 2085.475001] brcmfmac: power management disabled
[ 2124.380808] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2124.381146] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2124.381156] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 2124.622345] usbcore: deregistering interface driver brcmfmac
[ 2124.705432] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 2124.714194] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 2124.716213] usbcore: registered new interface driver brcmfmac
[ 2124.929556] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 2124.929993] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 2125.105218] brcmfmac: power management disabled
[ 2125.150290] brcmfmac: power management disabled
[ 8237.434034] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content:
0x40012
[ 8239.890302] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8239.890822] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8239.890835] brcmfmac: brcmf_run_escan: error (-110)
[ 8239.890845] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
[ 8254.280425] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 8254.280438] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 8254.280491] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.280498] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 8254.800394] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.803873] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.808353] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.808370] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 8254.881402] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.881420] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 8255.001550] usbcore: deregistering interface driver brcmfmac
[ 8255.071184] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8255.077098] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8255.077348] usbcore: registered new interface driver brcmfmac
[ 8257.730418] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8257.751038] brcmfmac: brcmf_c_get_clm_name: retrieving revision info
failed (-110)
[ 8257.751049] brcmfmac: brcmf_c_process_clm_blob: get CLM blob file name
failed (-110)
[ 8257.751068] brcmfmac: brcmf_c_preinit_dcmds: download CLM blob file
failed, -110
[ 8257.751076] brcmfmac: brcmf_bus_started: failed: -110
[ 8257.751114] brcmfmac: brcmf_sdio_firmware_callback: dongle is not
responding
[ 8304.417684] usbcore: deregistering interface driver brcmfmac
[ 8304.486099] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8304.493613] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8304.494078] usbcore: registered new interface driver brcmfmac
[ 8304.686761] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 8304.687203] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 8304.829994] brcmfmac: power management disabled
[ 8304.907662] brcmfmac: power management disabled
[ 8357.441791] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content:
0x40012
[ 8359.891146] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8359.891655] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8359.891668] brcmfmac: brcmf_run_escan: error (-110)
[ 8359.891677] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
[ 8371.731226] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8371.731731] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8371.731746] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[ 8373.941267] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 8373.941280] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 8373.941330] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8373.941338] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 8374.461245] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.461942] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.463553] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.463573] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 8374.564729] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.564750] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 8374.702401] usbcore: deregistering interface driver brcmfmac
[ 8374.759839] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8374.767561] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8374.771137] usbcore: registered new interface driver brcmfmac
[ 8377.411255] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8377.431924] brcmfmac: brcmf_c_get_clm_name: retrieving revision info
failed (-110)
[ 8377.431934] brcmfmac: brcmf_c_process_clm_blob: get CLM blob file name
failed (-110)
[ 8377.431941] brcmfmac: brcmf_c_preinit_dcmds: download CLM blob file
failed, -110
[ 8377.431949] brcmfmac: brcmf_bus_started: failed: -110
[ 8377.432003] brcmfmac: brcmf_sdio_firmware_callback: dongle is not
responding
[ 8424.133114] usbcore: deregistering interface driver brcmfmac
[ 8424.229631] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8424.237210] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8424.239352] usbcore: registered new interface driver brcmfmac
[ 8424.460736] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 8424.461174] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 8424.646993] brcmfmac: power management disabled
[ 8424.708633] brcmfmac: power management disabled
brubbel commented 6 years ago

More info. Compiling hostapd/wpa_supplicant from source with the same version as indicated in the wpasupplicant source package (wpa 2:2.4-1+deb9u1) stops the immediate crashes, in any order of starting hostapd and wpa_supplicant, multiple times. Also, compiling from upstream (https://w1.fi/cvs.html) does not show the crashes.

So, what is the difference with a clean 2018-06-27-raspbian-stretch-lite installation? Bug related to krack (http://seclists.org/bugtraq/2017/Oct/25) patches?

Edit: still crashes on wl0: wlc_iovar_op: wpaie BCME -7 (Not STA), but much less frequent.

gdb-power commented 6 years ago

@brubbel the crash you're investigating is occurring at firmware instruction pointer = 0x00021e2c (somewhere in ROM).

There are also firmware crashes at other locations that cause "Unknown mailbox data content: 0x40012", those occur during normal operation. I've collected a few of these crashlogs during 2 days of operation, they all seem to be in the RAM region of the firmware (0x180000-0x25ffff, more or less):

$ grep -a TRAP forensics.log | sort | uniq -c
      2 TRAP 4(25fd10): pc 1a4bdc, lr 1a4bcb, sp 25fd68, cpsr 19f, spsr 1bf
      1 TRAP 4(25fd78): pc 19ecb6, lr 19ecaf, sp 25fdd0, cpsr 2000019f, spsr 200001bf
      7 TRAP 4(25fd78): pc 19ecc2, lr 19ecaf, sp 25fdd0, cpsr 2000019f, spsr 200001bf
      1 TRAP 4(25fdc0): pc 19ed20, lr 1a2545, sp 25fe18, cpsr 2000019f, spsr 200001bf
      3 TRAP 4(25fdc0): pc 19ed2c, lr 1a2545, sp 25fe18, cpsr 6000019f, spsr 600001bf
     59 TRAP 4(25fdc8): pc 1a246a, lr 1a246b, sp 25fe20, cpsr 6000019f, spsr 600001bf

Those are probably buggy information-element parsing functions, as described below, but without firmware source code we can't really fix them:

see figure: wlc_recv_mgmtact - 0x1A79F4 https://googleprojectzero.blogspot.com/2017/10/over-air-vol-2-pt-2-exploiting-wi-fi.html

brubbel commented 6 years ago

@gdb-power Correct, I also noticed that the stack pointer had different addresses, but I had thought this was due to loading the firmware at random locations in memory. Imho, such a firmware driver should employ a stateless design pattern (and static memory allocations) to the extent possible, which is probably not the case.

For example, by messing around one can get into the state where the nl80211 driver has no AP interface, while it is not possible to add the interface because the low-level firmware thinks it already has the interface => duplicate states and out of sync. So, the AP is still happily sending out beacons without permission of the driver.

brubbel commented 6 years ago

Concerning the firmware crash at ROM location 0x00021e2c for CYW43455, RPI3B+, firmware driver brcmfmac43455-sdio.bin (Feb 27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04)

1. Bringing the AP up.

The following line in the brcmfmac driver is most likely the culprit. brcmf_fil_iovar_int_set(ifp, "apsta", 0); When the AP is started, this command is issued to the ap0 interface (ifidx=1), but it scares the hell out of the CYW43455 RTOS when interface with ifidx=0 is already in associated state. This command is issued when RSDB (2.4/5GHz simultanuous dual band) is not supported by the device.

As ap0 is already in AP mode (iw dev wlan0 interface add ap0 type __ap), I removed it and have now a working AP+STA on RPI3B+ ! Note: Make sure both are on the same channel.

2. Bringing the AP down.

Secondly, when the AP (ifidx=1) is stopped while STA (ifidx=0) is associated, the latter always looses its connection. This is caused by the command (ref and ref): brcmf_fil_cmd_int_set(ifp, BRCMF_C_DOWN, 1); which is related to a patch to support multiple AP's. It is not clear yet why a BRCMF_C_DOWN on ifidx=1 also affects ifidx=0. However, let's first try to get 1 AP working, so remove this line, then recompile the brcmfmac kernel module and the STA stays up when AP goes down.

Changes here, compiled brcmfmac test module here, goes into /lib/modules/$(uname -r)/kernel/drivers/net/wireless/broadcom/brcm80211/brcmfmac/brcmfmac.ko Tested on 2018-06-27-raspbian-stretch-lite, 4.14.54-v7+.

brubbel commented 6 years ago

TL;DR Patches here, compiled module here.

More findings (still related to CYW43455, RPI3B+ as above).

Consider wpa_supplicant configured with (1) ap_scan=1 and (2) a valid network. When this network goes out of range, wpa_supplicant starts complaining every few seconds: wlan0: Failed to initiate sched scan and /sys/kernel/debug/brcmfmac/mmc1:0001:1/forensics: pfn_gscan_cfg BCME -23 (Unsupported)

From this patch, it seems that pfn_gscan_cfg is disabled for bcm4345* devices (id=17221), however this is not the case, because brcmf_feat_iovar_int_get(ifp, BRCMF_FEAT_PNO, "pfn"); loads the PNO scan feature anyway.

Including the BRCMF_FEAT_PNO line in the blacklist for bcm4345* fixes this. wpa_supplicant background scans now work properly. Because powersave is disabled by default on raspbian, PNO doesn't really add value.

Maybe this PNO feature could possibly be related to the fw crashes, since the PNO scans are not active as long as the STA is associated. Also maybe, the PNO scans are more active in powersave mode, which could explain why disabling powersave improves the overall stability.

Edit: also fixes STA+AP for RPI3B rev.a22082, firmware driver brcmfmac43430-sdio.bin (Firmware version = wl0: Oct 23 2017 03:55:53 version 7.45.98.38 (r674442 CY) FWID 01-e58d219f).

brubbel commented 6 years ago

Question: is it possible to unbind/bind the 3f300000.mmc on the RPI3B? It results in the following (due to wrong mmc clock? Is the only way a custom dtoverlay?)

mmc1: bad CIS tuple 0x20 (0 bytes)
mmc1: error -22 whilst initialising SDIO card

Thanks!

gdb-power commented 6 years ago

The unbind/bind trick should work on all Raspberries. You just need to find the correct mmc address.

On Pi 3B+ it's 3f300000.mmc On Pi Zero W it's 20300000.mmc For other Pis, use this command to find the correct mmc address: sudo ls -la /sys/bus/platform/drivers/mmc-bcm2835

brubbel commented 5 years ago

Still the same error for RPI3B (rev. a02082).

cyph84 commented 5 years ago

In my case, I have run over 100+ RPi 3 as open access points about a year and they are fine. However, protecting the APs with WPA-PSK, we started seeing this error after a couple of days.

Amit0301 commented 5 years ago

In my case I tested on 3 different RPI3B+ with same SD card, working on 2 showing following error on 3rd.

[ 12.779953] fuse init (API version 7.26) [ 19.657576] brcmfmac: brcmf_sdiod_regrw_helper: failed to read data F1@0x08000, err: -110 [ 19.657592] brcmfmac: brcmf_chip_recognition: chip backplane type 15 is not supported [ 19.657601] brcmfmac: brcmf_sdio_probe_attach: brcmf_chip_attach failed! [ 19.657610] brcmfmac: brcmf_sdio_probe: brcmf_sdio_probe_attach failed [ 19.657823] brcmfmac: brcmf_ops_sdio_probe: F2 error, probe failed -19... [ 19.658194] usbcore: registered new interface driver brcmfmac

Any idea?

ndarilek commented 5 years ago

Has anyone started from a known working firmware, then advanced commit by commit until they found a buggy one? Would such an analysis be helpful?

In my case, I don't know if I'm experiencing the same issue as documented here or something new. The latest kernel/wifi drivers fail when used as an access point. Sometimes there are dmesg errors, sometimes not. Stations just get dissociated and can't reconnect unless I either restart hostapd or perform some action that seems to kick the internal wifi back into operation, like scans or listing stations. My solution has been using rpi-update to downgrade to the firmware/kernel combination released on 2018-04-16, which has been rock solid.

I'd be happy to advance through each firmware commit every few days, then report when I've found the one where the errors begin, but only if that would actually be helpful. Otherwise, I'm sticking with 2018-04-16. :)

FWIW, this happens for me on both the 3B and 3B+. So I'm not sure if my issue is in fact this one, but it isn't a misconfiguration on my end since the exact same card behaves the same in both models, then the issues go away entirely if I downgrade firmware. Finding anything on this issue has been tough, since everything out there just tells me to disable power management, which it seems is always off these days anyway (though 2018-04-16 did exhibit some lockups until I disabled power management. It may technically always be off, but typing the command did something on the firmware level that made things in that version more stable.)

pelwell commented 5 years ago

If you can point to a specific firmware (by which I mean an rpi-update start.elf + kernel + DTBs) release that breaks AP then that would be very helpful.

ndarilek commented 5 years ago

Cool, every few days I'll rpi-update to a new commit SHA, and as soon as I start having issues I'll report back and downgrade.

cyph84 commented 5 years ago

I am having this issue on a rather ancient 4.9.12-1-ARCH kernel.

From some light testing over the past week:

JamesH65 commented 5 years ago

@cyph84 Is this on a 3B+? We recommend updating to the latest kernel, retested, and reporting if the issues is still present.

ingo-h commented 5 years ago

I'm struggling with this issue for some time and have just seen this discussion. Here is what I have found so far.

First I followed the advice from @JamesH65 and updated to the latest kernel with rpi-update on a fresh flashed Raspbian Stretch Lite 2018-06-27:

rpi ~$ sudo apt update
rpi ~$ sudo apt full-upgrade

Reboot.

rpi ~$ sudo rpi-update

firmware was successfully updated to 45782b55788c58b10b6376487fd86ca9c13296e1

Reboot.

rpi ~$ uname -a
rpi ~$ Linux raspberrypi 4.14.67-v7+ #1139 SMP Wed Aug 29 15:17:05 BST 2018 armv7l GNU/Linux

I successful setup dual mode STA and AP with hostapd and found that we have to setup the needed services in a strict sequence, otherwise it won't work.

  1. create a virtual interface ap0 for the access point
  2. start access point daemon hostapd using interface ap0
  3. start wpa_supplicant for wifi client using interface wlan0

For details how to do that look at Raspberry Pi Zero W as a WiFi repeater.

Now I tried to use wpa_supplicant instead of hostapd. I use systemd-networkd and created these files:

rpi ~$ sudo -Es
# wpa_supplicant as wifi client
rpi ~# cat > /etc/wpa_supplicant/wpa_supplicant-wlan0.conf <<EOF
country=DE
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

network={
    ssid="wlan@hoeft-online.de"
    psk="Password"
}
EOF

# wpa_supplicant as access point
rpi ~# cat > /etc/wpa_supplicant/wpa_supplicant-ap0.conf <<EOF
country=DE
ctrl_interface=DIR=/var/run/wpa_supplicant GROUP=netdev
update_config=1

ap_scan=2
network={
    ssid="RPiNet"
    mode=2
    key_mgmt=WPA-PSK
    psk="anotherPassword"
    frequency=2412
}
EOF

rpi ~# cat > /etc/systemd/network/08-wlan0.network <<EOF
[Match]
Name=wlan0
[Network]
DHCP=yes
EOF

rpi ~# cat > /etc/systemd/network/12-ap0.network <<EOF
[Match]
Name=ap0
[Network]
Address=192.168.4.1/24
DHCPServer=yes
EOF

Then followed the order above:

rpi ~# /sbin/iw dev wlan0 interface add ap0 type __ap
rpi ~# /sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant-ap0.conf -Dnl80211 -iap0 &
rpi ~# /sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant-wlan0.conf -Dnl80211 -iwlan0 &

And here is the crash.
The question is: why does it work with hostapd but not with wpa_supplicant for access point?

rpi ~# journalctl -b -e
Sep 04 20:16:23 raspberrypi systemd-networkd[604]: ap0: IPv6 enabled for interface: Success
Sep 04 20:16:23 raspberrypi avahi-daemon[244]: Joining mDNS multicast group on interface ap0.IPv4 with address 192.168.4.1.
Sep 04 20:16:23 raspberrypi kernel: IPv6: ADDRCONF(NETDEV_UP): ap0: link is not ready
Sep 04 20:16:23 raspberrypi avahi-daemon[244]: New relevant interface ap0.IPv4 for mDNS.
Sep 04 20:16:23 raspberrypi avahi-daemon[244]: Registering new address record for 192.168.4.1 on ap0.IPv4.
Sep 04 20:16:23 raspberrypi systemd-timesyncd[238]: No network connectivity, watching for changes.
Sep 04 20:16:23 raspberrypi systemd-timesyncd[238]: No network connectivity, watching for changes.
Sep 04 20:16:35 raspberrypi systemd-timesyncd[238]: No network connectivity, watching for changes.
Sep 04 20:16:35 raspberrypi systemd-networkd[604]: ap0: Gained carrier
Sep 04 20:16:35 raspberrypi kernel: IPv6: ADDRCONF(NETDEV_CHANGE): ap0: link becomes ready
Sep 04 20:16:36 raspberrypi systemd-networkd[604]: ap0: Gained IPv6LL
Sep 04 20:16:36 raspberrypi avahi-daemon[244]: Joining mDNS multicast group on interface ap0.IPv6 with address fe80::ba27:ebff:fe06:
Sep 04 20:16:36 raspberrypi avahi-daemon[244]: New relevant interface ap0.IPv6 for mDNS.
Sep 04 20:16:36 raspberrypi avahi-daemon[244]: Registering new address record for fe80::ba27:ebff:fe06:e88b on ap0.*.
Sep 04 20:16:49 raspberrypi systemd-networkd[604]: ap0: Configured
Sep 04 20:17:01 raspberrypi CRON[644]: pam_unix(cron:session): session opened for user root by (uid=0)
Sep 04 20:17:01 raspberrypi CRON[648]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 04 20:17:01 raspberrypi CRON[644]: pam_unix(cron:session): session closed for user root
Sep 04 20:23:59 raspberrypi kernel: brcmfmac: brcmf_cfg80211_add_iface: iface validation failed: err=-16
Sep 04 20:23:59 raspberrypi kernel: brcmfmac: brcmf_vif_set_mgmt_ie: vndr ie set error : -30
Sep 04 20:23:59 raspberrypi kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-30)
Sep 04 20:24:04 raspberrypi kernel: brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012
Sep 04 20:24:06 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:06 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:09 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:09 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:09 raspberrypi kernel: brcmfmac: brcmf_c_set_joinpref_default: Set join_pref error (-110)
Sep 04 20:24:11 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:11 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:14 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:14 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:14 raspberrypi kernel: brcmfmac: brcmf_cfg80211_connect: BRCMF_C_SET_SSID failed (-110)
Sep 04 20:24:16 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:16 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:16 raspberrypi kernel: brcmfmac: brcmf_do_escan: error (-110)
Sep 04 20:24:16 raspberrypi kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Sep 04 20:24:20 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:20 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 04 20:24:20 raspberrypi kernel: brcmfmac: brcmf_do_escan: error (-110)
Sep 04 20:24:20 raspberrypi kernel: brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Sep 04 20:24:23 raspberrypi kernel: brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 04 20:24:23 raspberrypi kernel: brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
<repeated>
ndarilek commented 5 years ago

Well, crap. I was happily iterating through firmware versions, upgrading from commit to commit every time I had a few days of solid runtime, because the latest kernels fall down after a few hours of light AP use. I hadn't upgraded for 5 days or so because of the Labor Day holiday, but today my AP fell over with identical symptoms and only this in dmesg:


[475224.495362] brcmfmac: brcmf_netdev_wait_pend8021x: Timed out waiting 
for no pending 802.1x packets

Fortunately I have both a fallback wireless connection and a duckdns 
hostname assigned to the router. I can SSH into the ethernet-connected 
interface. Essentially, running a wifi scan seems to kick everything 
back to life, and my stations can re-associate with the AP.

But all that means my previous bisect efforts must be invalid. There's 
clearly a point where this starts falling over every few hours instead 
of every few days, but since it took so long this time, I don't know 
that there's any point where it doesn't fall over at *all*. Super 
discouraging. I get that we're dealing with a closed-source binary blob, 
but isn't the RPi foundation a big enough client that Cypress could take 
a few hours to duplicate this issue and figure out a fix? I can 
replicate it with the access point setup at 
https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md, 
so if they won't release the source, at least they have a reliable 
reproduction for an issue faced by a presumably large client-base.

Thanks for everyone's work debugging this. Hope we can get this 
resolved, as this little Pi gives me most of what I want from an access 
point.
JamesH65 commented 5 years ago

I have asked Cypress (last week) to continue looking at the case, for some reason they thought it was closed.

rblumel commented 5 years ago

I am also having this issue described. Mailbox error on the RaspberryPi 3 B+, latest kali build with nexon drivers.

JamesH65 commented 5 years ago

As we always say, please indicate if it also goes wrong on Raspbian - we cannot support 3rd party OS's, but if you can replicate on our OS then we will look at the issue when time allows.

rudiratlos commented 5 years ago

since the last upgrade to latest raspbian, my hostapd is not working on my 3B anymore. On my 3B+ it has never worked stable. I used the following description as setup: https://github.com/peebles/rpi3-wifi-station-ap-stretch

cyph84 commented 5 years ago

@JamesH65: this is on a rather ancient 4.9.12-1-ARCH kernel on RPi 3b ( not the b+ ). I will need some time to test this fully on a 3b+ with a more recent kernel.

We have about 200 RPi 3b deployed in the wild with an open (no encryption) AP and this has proven stable for nearly two years.

However my first attempt at adding encryption key triggered this error.

I realized, while debugging a separate problem, hostapd was defaulting to using the older TKIP encryption. Adding rsn_pairwise=CCMP seems to solve this problem, as well as improving transfer rates from 10Mbps to 40Mbps (similiar to unencrypted max throughput).

This configuration has been guinea pigged on two RPi 3b in production, and no issues have been reported for a couple of weeks.

TLDR: it's something to do with using legacy encryption.

@ingo-h: Can you confirm the type of encryption used between the two different setups? (I used Wifi Analyzer on an Android phone.)

ingo-h commented 5 years ago

@cyph84 very interesting point. Haven't had a focus on encryption and cannot answer your question ad hoc. But I will have a look at this and report next time.

ingo-h commented 5 years ago

I have tried to test with different encryptions and enabled debug output from wpa_supplicant and hostapd. hostapd works, so I tried to set wpa_supplicant the same encryption than hostapd. In the debug output I found this messages

for hostapd:

# auth_algs=1, wpa=2, wpa_key_mgmt=WPA-PSK, wpa_pairwise=TKIP, rsn_pairwise=CCMP
nl80211: hidden SSID not in use
nl80211: privacy=1
nl80211: auth_algs=0x1
nl80211: wpa_version=0x2
nl80211: key_mgmt_suites=0x2
nl80211: pairwise_ciphers=0x18
nl80211: group_cipher=0x10

for wpa_supplicant with default settings:

# key_mgmt=WPA-PSK
nl80211: hidden SSID not in use
nl80211: privacy=1
nl80211: auth_algs=0x3
nl80211: wpa_version=0x3
nl80211: key_mgmt_suites=0x2
nl80211: pairwise_ciphers=0x18
nl80211: group_cipher=0x8
nl80211: SMPS mode - off

auth_algs, wpa_version and group_cipher differ. So I tried to find settings for wpa_supplicant to get the same values from hostapd. After some try and error I have this for wpa_supplicant:

# key_mgmt=WPA-PSK, proto=RSN, auth_alg=OPEN
nl80211: hidden SSID not in use
nl80211: privacy=1
nl80211: auth_algs=0x1
nl80211: wpa_version=0x2
nl80211: key_mgmt_suites=0x2
nl80211: pairwise_ciphers=0x18
nl80211: group_cipher=0x8
nl80211: SMPS mode - off

The only difference now is group_ciper=0x10 for hostapd and 0x8 for wpa_supplicant. I haven't found an option to change this.

This settings doesn't help. wpa_supplicant always crashes.

There is another issue seen in the debug output from wpa_supplicant. It always tries to open a p2p interface. For hostpad interfaces it looks like:

~# /sbin/iw dev
phy#0
        Interface ap0
                ifindex 4
                wdev 0x2
                addr b8:27:eb:06:e8:8b
                ssid RPiNet
                type AP
                channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz
        Interface wlan0
                ifindex 3
                wdev 0x1
                addr b8:27:eb:06:e8:8b
                type managed
                channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz

But with wpa_supplicant I always get this odd Unnamed/non-netdev P2P-device and I don't know where this mac address 22:50:63:06:19:4a comes from and it changes. None of my interfaces (eth0, wlan0) has this mac address.

# /sbin/iw dev
phy#0
        Unnamed/non-netdev interface
                wdev 0x3
                addr 22:50:63:06:19:4a
                type P2P-device
                txpower 31.00 dBm
        Interface ap0
                ifindex 4
                wdev 0x2
                addr b8:27:eb:06:e8:8b
                ssid RPiNet
                type AP
                channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz
                txpower 31.00 dBm
        Interface wlan0
                ifindex 3
                wdev 0x1
                addr b8:27:eb:06:e8:8b
                ssid wlan@hoeft-online.de
                type managed
                channel 1 (2412 MHz), width: 20 MHz, center1: 2412 MHz
                txpower 31.00 dBm

For debug I use these commands:

~# /sbin/iw dev wlan0 interface add ap0 type __ap

# works
~# /usr/sbin/hostapd -P /run/hostapd.pid /etc/hostapd/hostapd.conf -d &
~# /sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant-wlan0.conf -Dnl80211 -iwlan0 -d &

# crashes
~# /sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant-ap0.conf -Dnl80211 -iap0 -d &
~# /sbin/wpa_supplicant -c/etc/wpa_supplicant/wpa_supplicant-wlan0.conf -Dnl80211 -iwlan0 -d &

How can I wpa_supplicant tell not to make this Unnamed/non-netdev interface?

rblumel commented 5 years ago

To disable p2p use p2p_disabled=1 in the conf file. As for the MAC address, I think that might be a random generated one. Some options in wpa_suplicant support that. I search IEEE OUI db and didnt find anything for 22:50:63

# MAC address policy default
# 0 = use permanent MAC address
# 1 = use random MAC address for each ESS connection
# 2 = like 1, but maintain OUI (with local admin bit set)
#
# By default, permanent MAC address is used unless policy is changed by
# the per-network mac_addr parameter. Global mac_addr=1 can be used to
# change this default behavior.
#mac_addr=0

# Lifetime of random MAC address in seconds (default: 60)
#rand_addr_lifetime=60

# MAC address policy for pre-association operations (scanning, ANQP)
# 0 = use permanent MAC address
# 1 = use random MAC address
# 2 = like 1, but maintain OUI (with local admin bit set)
#preassoc_mac_addr=0

# MAC address policy for GAS operations
# 0 = use permanent MAC address
# 1 = use random MAC address
# 2 = like 1, but maintain OUI (with local admin bit set)
#gas_rand_mac_addr=0
ingo-h commented 5 years ago

@rblumel There is no option p2p_disabled documented in /usr/share/doc/wpa_supplicant/examples/wpa_supplicant.conf.gz.

rblumel commented 5 years ago

But, did it work?

On Tue, Oct 2, 2018 at 12:58 AM Ingo notifications@github.com wrote:

@rblumel https://github.com/rblumel There is no option p2p_disabled documented in /usr/share/doc/wpa_supplicant/examples/wpa_supplicant.conf.gz.

— You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/2453#issuecomment-426183398, or mute the thread https://github.com/notifications/unsubscribe-auth/AcDdhZpKOLXOB0ulCEs1a-TSc4aQUBKLks5ugxysgaJpZM4SwkCN .

ingo-h commented 5 years ago

@rblumel Setting p2p_disabled=1 in /etc/wpa_supplicant/wpa_supplicant.conf in a network block I get the error messages:

wpa_supplicant[606]: Line 6: unknown network field 'p2p_disabled'.
wpa_supplicant[606]: Line 9: failed to parse network block.
wpa_supplicant[606]: Failed to read or parse configuration '/etc/wpa_supplicant/wpa_supplicant-wlan0.conf'.

Wifi does not start. Setting that option as global parameter (outside a network block) wifi starts without error message but the option is simply ignored.I get the messages:

wpa_supplicant[641]: Failed to create interface p2p-dev-p2p-dev-wlan0: -22 (Invalid argument)
wpa_supplicant[641]: nl80211: Failed to create a P2P Device interface p2p-dev-p2p-dev-wlan0
wpa_supplicant[641]: P2P: Failed to enable P2P Device interface

and this dubious Unnamed interface is still there:

rpi ~$ sudo iw dev
phy#0
        Unnamed/non-netdev interface
                wdev 0x3
                addr 3a:ac:98:37:85:2c
                type P2P-device
                txpower 31.00 dBm
        Interface wlan0
                ifindex 3
                wdev 0x1
                addr b8:27:eb:06:e8:8b
                ssid wlan@hoeft-online.de
                type managed
                channel 1 (2412 MHz), width: 40 MHz, center1: 2422 MHz
                txpower 31.00 dBm
ghost commented 5 years ago

It may be slightly off topic because it's a different distribution, but it's pretty easy to trigger this error when using Kali Pi. Putting the device into monitor mode (using mon0up in Kali-Pi) and running aireplay-ng --test causes it to emit the Unknown mailbox data content: 0x40012 error almost immediately. From then on, the wifi is worthless until you reboot.

For reference, mon0up is a short shell script that runs iw phy phy0 interface add mon0 type monitor and ifconfig mon0 up and displays some brief info.

That's exactly what I'm running into. Do you know of any solutions to this and why it is happending?

XECDesign commented 5 years ago

Isn't this issue invalid since everyone here seems to be using nexmon patched firmware?

ndarilek commented 5 years ago

I don't think I am. I'm using Raspbian with a pinned kernel that doesn't exhibit this behavior. I just never upgrade the kernel and accept the potential consequences.

mlinton commented 5 years ago

ndarilek: what version of raspbian and kernel are you using that works? Does it also support monitor mode?

mlinton commented 5 years ago

I don't think I am. I'm using Raspbian with a pinned kernel that doesn't exhibit this behavior. I just never upgrade the kernel and accept the potential consequences.

Yes - but the official RPI3B+ firmware doesn't support monitor mode, so that's why the nexmon firmware is being attempted.

brubbel commented 4 years ago

Another tip: when the firmware crashes, you can collect internal firmware stack traces here (you may need to compile the driver with DEBUG enabled):

cat /sys/kernel/debug/brcmfmac/mmc1:0001:1/forensics

brcmfmac's debugfs is moved to /sys/kernel/debug/ieee80211/phyX/* However due to a new bug, forensics and counters are not available in kernel 4.19.x https://lore.kernel.org/linux-wireless/1550148232-4309-4-git-send-email-arend.vanspriel@broadcom.com/

brubbel commented 4 years ago

New firmware available (2019-05-08). Seems to work much better with concurrent STA+AP. https://community.cypress.com/docs/DOC-17441

For RPI3B+ you need to copy the brcmfmac43455-sdio.* files to /lib/firmware/brcm/

sub0ne commented 4 years ago

I'm on Linux raspberrypi 4.19.69-v7+ #1256 and after running few hours without problems I got the following entries in my syslog. Internet search brought me here...could this problem be related? Interesting stuff begins at 00:46:53 where brcmfmac starts to fail and never returns.

Sep 23 00:00:22 raspberrypi rsyslogd:  [origin software="rsyslogd" swVersion="8.1901.0" x-pid="321" x-info="https://www.rsyslog.com"] rsyslogd w$
Sep 23 00:00:22 raspberrypi systemd[1]: logrotate.service: Succeeded.
Sep 23 00:00:22 raspberrypi systemd[1]: Started Rotate log files.
Sep 23 00:00:23 raspberrypi systemd[1]: man-db.service: Succeeded.
Sep 23 00:00:23 raspberrypi systemd[1]: Started Daily man-db regeneration.
Sep 23 00:17:01 raspberrypi CRON[15085]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: carrier lost
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: deleting address 2003:e0:7f1f:8500:35d3:4c67:a30b:2ce8/64
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: deleting route to 2003:e0:7f1f:8500::/64
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: deleting default route via fe80::3a10:d5ff:fe60:4a2a
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Withdrawing address record for 2003:e0:7f1f:8500:35d3:4c67:a30b:2ce8 on wlan0.
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Leaving mDNS multicast group on interface wlan0.IPv6 with address 2003:e0:7f1f:8500:35d3:4c67:a30$
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Joining mDNS multicast group on interface wlan0.IPv6 with address fe80::ef45:8c3a:42af:1e8d.
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Registering new address record for fe80::ef45:8c3a:42af:1e8d on wlan0.*.
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Withdrawing address record for 192.168.178.163 on wlan0.
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Leaving mDNS multicast group on interface wlan0.IPv4 with address 192.168.178.163.
Sep 23 00:45:50 raspberrypi avahi-daemon[297]: Interface wlan0.IPv4 no longer relevant for mDNS.
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: deleting route to 192.168.178.0/24
Sep 23 00:45:50 raspberrypi dhcpcd[475]: wlan0: deleting default route via 192.168.178.1
Sep 23 00:45:55 raspberrypi dhcpcd[475]: wlan0: carrier acquired
Sep 23 00:45:55 raspberrypi dhcpcd[475]: wlan0: IAID eb:3e:1a:e0
Sep 23 00:45:55 raspberrypi dhcpcd[475]: wlan0: rebinding lease of 192.168.178.163
Sep 23 00:45:55 raspberrypi dhcpcd[475]: wlan0: probing address 192.168.178.163/24
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: soliciting an IPv6 router
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: Router Advertisement from fe80::3a10:d5ff:fe60:4a2a
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: adding address 2003:e0:7f1f:8500:35d3:4c67:a30b:2ce8/64
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: adding route to 2003:e0:7f1f:8500::/64
Sep 23 00:45:56 raspberrypi avahi-daemon[297]: Leaving mDNS multicast group on interface wlan0.IPv6 with address fe80::ef45:8c3a:42af:1e8d.
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: adding default route via fe80::3a10:d5ff:fe60:4a2a
Sep 23 00:45:56 raspberrypi avahi-daemon[297]: Joining mDNS multicast group on interface wlan0.IPv6 with address 2003:e0:7f1f:8500:35d3:4c67:a30$
Sep 23 00:45:56 raspberrypi avahi-daemon[297]: Registering new address record for 2003:e0:7f1f:8500:35d3:4c67:a30b:2ce8 on wlan0.*.
Sep 23 00:45:56 raspberrypi dhcpcd[475]: wlan0: soliciting a DHCPv6 lease
Sep 23 00:45:56 raspberrypi avahi-daemon[297]: Withdrawing address record for fe80::ef45:8c3a:42af:1e8d on wlan0.
Sep 23 00:45:57 raspberrypi dhcpcd[475]: wlan0: DHCPv6 REPLY: in stateless mode
Sep 23 00:45:58 raspberrypi dhcpcd[475]: wlan0: DHCPv6 REPLY: in stateless mode
Sep 23 00:46:00 raspberrypi dhcpcd[475]: wlan0: DHCPv6 REPLY: in stateless mode
Sep 23 00:46:01 raspberrypi dhcpcd[475]: wlan0: leased 192.168.178.163 for 864000 seconds
Sep 23 00:46:01 raspberrypi avahi-daemon[297]: Joining mDNS multicast group on interface wlan0.IPv4 with address 192.168.178.163.
Sep 23 00:46:01 raspberrypi avahi-daemon[297]: New relevant interface wlan0.IPv4 for mDNS.
Sep 23 00:46:01 raspberrypi dhcpcd[475]: wlan0: adding route to 192.168.178.0/24
Sep 23 00:46:01 raspberrypi dhcpcd[475]: wlan0: adding default route via 192.168.178.1
Sep 23 00:46:01 raspberrypi avahi-daemon[297]: Registering new address record for 192.168.178.163 on wlan0.IPv4.
Sep 23 00:46:04 raspberrypi dhcpcd[475]: wlan0: DHCPv6 REPLY: in stateless mode
Sep 23 00:46:11 raspberrypi dhcpcd[475]: wlan0: DHCPv6 REPLY: in stateless mode
Sep 23 00:46:53 raspberrypi kernel: [190079.012984] brcmfmac: brcmf_sdio_hostmail: mailbox indicates firmware halted
Sep 23 00:47:00 raspberrypi kernel: [190086.579355] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 23 00:47:00 raspberrypi kernel: [190086.579566] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 23 00:47:00 raspberrypi kernel: [190086.579575] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
Sep 23 00:47:17 raspberrypi kernel: [190103.139510] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 23 00:47:17 raspberrypi kernel: [190103.139998] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 23 00:47:17 raspberrypi kernel: [190103.140014] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
Sep 23 00:47:19 raspberrypi kernel: [190105.699521] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 23 00:47:19 raspberrypi kernel: [190105.699962] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 23 00:47:19 raspberrypi kernel: [190105.699979] brcmfmac: brcmf_cfg80211_get_tx_power: error (-110)
Sep 23 00:47:22 raspberrypi kernel: [190108.259470] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Sep 23 00:47:22 raspberrypi kernel: [190108.259690] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Sep 23 00:49:47 raspberrypi dhcpcd[475]: wlan0: fe80::3a10:d5ff:fe60:4a2a is unreachable, expiring it
...
Sep 23 00:50:04 raspberrypi kernel: [190270.020426] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Sep 23 00:50:04 raspberrypi kernel: [190270.020459] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
Sep 23 00:50:06 raspberrypi kernel: [190272.580445] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
Sep 23 00:50:06 raspberrypi kernel: [190272.580462] brcmfmac: brcmf_cfg80211_get_tx_power: error (-110)
brubbel commented 4 years ago

What I found until now:

  1. Never start hostapd when wpa_supplicant is running (default on raspbian stretch/buster)
  2. Never disable/enable hostapd when wpa_supplicant is running
  3. If wpa_supplicant is started, it will disable hostapd beacons, while hostapd reports that it is still up (mismatch between brcmfmac driver and internal bcm4355 firmware states). You can disable/enable hostapd via hostapd_cli and it will start working again, but as said in (2) this will make the STA portion of the firmware to fail (associating as a client will never work again until the FW is reloaded into the bcm4355 via an unbind/bind cycle) The only reliable way to get the beacons back is to do a forced scan via wpa_supplicant. I don't want to know what is going on inside that firmware :-)
JamesH65 commented 4 years ago

@brubbel Have you reported this on linux-wireless mailing list? The brcmfmac driver guys is on there so would be interested to hear your findings. Also frequented by the Cypress guys.

brubbel commented 4 years ago

Not yet reported, I am still struggling to get my mind around these findings.

rblumel commented 4 years ago

Well if it’s reproducible, it can be debugged and fixed a lot easier. No?

On Tue, Sep 24, 2019 at 7:33 AM brubbel notifications@github.com wrote:

Not yet reported, I am still struggling to get my mind around these findings.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/2453?email_source=notifications&email_token=AHAN3BLRUXZ364QHG7PMU6LQLIQNHA5CNFSM4EWCICG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7OSOUY#issuecomment-534587219, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAN3BKWDNFBK5X7FV6OQMDQLIQNHANCNFSM4EWCICGQ .

JamesH65 commented 4 years ago

@rblumel I suspect access to the firmware source would be necessary, or at least having knowledge of it. That limits it to Cypress, and probably @arend who does a lot of work on the brcmfmac driver.

rblumel commented 4 years ago

Oh, forgot this was closed source.

On Tue, Sep 24, 2019 at 7:48 AM James Hughes notifications@github.com wrote:

@rblumel https://github.com/rblumel I suspect access to the firmware source would be necessary, or at least having knowledge of it. That limits it to Cypress, and probably @arend https://github.com/arend who does a lot of work on the brcmfmac driver.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/2453?email_source=notifications&email_token=AHAN3BOQLEFKCAS4TGWRTU3QLISC3A5CNFSM4EWCICG2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7OUBNY#issuecomment-534593719, or mute the thread https://github.com/notifications/unsubscribe-auth/AHAN3BOFANRPVMCTJCSY3HLQLISC3ANCNFSM4EWCICGQ .

macmpi commented 4 years ago

Following @rblumel 's post, it looks like Cypress v4.14.77-2019_0503 release contains lots of patches to brcmfmac , hostapd, wpa_supplicant & al... Unsure how much this whole deviates from what we get from current upstream components.

sub0ne commented 4 years ago

could anybody of you have a look at my log please? Does this problem correlate to the thing you're into? I've found out that the problem occurs when wifi signal is lost and should be reestablished. In my case triggered by a channel change of my router (AVM Fritz Box).

The wlan0 carrier gets lost, is reaquired and then brcmfmac starts to mess up.

hostapd is NOT installed, just using simple wifi connection to my router.

Benzomat commented 4 years ago

New firmware available (2019-05-08). Seems to work much better with concurrent STA+AP. https://community.cypress.com/docs/DOC-17441

For RPI3B+ you need to copy the brcmfmac43455-sdio.* files to /lib/firmware/brcm/

On my Raspberry Pi 4 the buggy firmware not only resulted in wifi freezes but also in a halted system. It didn't run longer than a day or so. I replaced the files like suggested and it seems to improve the stability. It ran longer than a day. The new firmware identifies as Firmware: BCM4345/6 wl0: May 2 2019 02:46:17 version 7.45.189 (r714228 CY) FWID 01-e1db26e2

Today I configured it to use ethernet instead of wlan. I'm running Pi-Hole on it which hadn't been available all the time. I wanted to track down the issue. While I was configuring the network connection, the errors occurred again. But at least the system didn't halt.

I'm not sure if the new firmware really fixes the issues, but it seems to improve the stability.

I must remark that I upgraded the linux firmware with rpi-update before I read about the available Broadcom firmware here.

Most likely I'll use a Raspberry Pi Zero W instead of the Raspberry Pi 4 to run Pi-Hole, because the performance of the Zero W should be sufficient. Then I'll install a clean Raspbian Buster image on my Raspberry Pi 4, replace the Broadcom firmware and test it again (without upgrading the linux firmware).

MrRyanM commented 4 years ago

It may be slightly off topic because it's a different distribution, but it's pretty easy to trigger this error when using Kali Pi. Putting the device into monitor mode (using mon0up in Kali-Pi) and running aireplay-ng --test causes it to emit the Unknown mailbox data content: 0x40012 error almost immediately. From then on, the wifi is worthless until you reboot. For reference, mon0up is a short shell script that runs iw phy phy0 interface add mon0 type monitor and ifconfig mon0 up and displays some brief info.

That's exactly what I'm running into. Do you know of any solutions to this and why it is happending?

Any news on this? I am having the same issue.

FrancisHGR commented 4 years ago

Any update? I get the error log and a wifi breakdown (I suspect because I cannot ssh anymore in the machine) between every 3 mins to 1-2hours after boot. I a running Raspbian 8 (Jessie) with 4.9.35-v7+ Kernel.

JamesH65 commented 4 years ago

@7rncs You need to upgrade to the latest Raspbian, we do not do any work on kernels that old, and it's possible your issue may no longer be present on the latest kernel.

FrancisHGR commented 4 years ago

@7rncs You need to upgrade to the latest Raspbian, we do not do any work on kernels that old, and it's possible your issue may no longer be present on the latest kernel.

I have been running this kernel for 4 years non-stop on my production raspberry home automation server. I haven't changed one line of code, how can this problem happen out of the blue? Smells like hardware problem to me... what do you think?

JamesH65 commented 4 years ago

HW problem, power supply starting to fail, SD card on the way out.