raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
10.88k stars 4.89k forks source link

wlan freezes in raspberry pi 3B+ #2453

Open ghost opened 6 years ago

ghost commented 6 years ago

(see also https://github.com/raspberrypi/linux/issues/1342 )

I've also got that problem with wifi dying.

Mar 17 18:25:28 hassass kernel: [10279.186321] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012
Mar 17 18:25:30 hassass kernel: [10281.665090] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
Mar 17 18:25:30 hassass kernel: [10281.665622] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
Mar 17 18:25:30 hassass kernel: [10281.665638] brcmfmac: brcmf_run_escan: error (-110)
Mar 17 18:25:30 hassass kernel: [10281.665647] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
Mar 17 18:26:30 hassass kernel: [10341.665866] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout

This is with 4.14.27-v7+ and with /sbin/iw dev wlan0 set power_save off /sbin/ifconfig wlan0 promisc in /etc/rc.local.

[    4.112717] brcmfmac: F1 signature read @0x18000000=0x15264345
[    4.119827] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[    4.120314] usbcore: registered new interface driver brcmfmac
[    4.440371] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[    4.440958] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[   10.911757] brcmfmac: power management disabled
[   12.016088] brcmfmac: power management disabled
[ 2074.090674] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 2074.090687] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 2074.090745] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.090753] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 2074.610583] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.611992] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.613945] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.613971] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 2074.729716] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2074.729733] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 2074.871693] usbcore: deregistering interface driver brcmfmac
[ 2074.929084] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 2074.936897] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 2074.937139] usbcore: registered new interface driver brcmfmac
[ 2075.118180] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 2075.118706] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 2075.215365] brcmfmac: power management disabled
[ 2075.263751] brcmfmac: power management disabled
[ 2085.475001] brcmfmac: power management disabled
[ 2124.380808] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2124.381146] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 2124.381156] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 2124.622345] usbcore: deregistering interface driver brcmfmac
[ 2124.705432] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 2124.714194] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 2124.716213] usbcore: registered new interface driver brcmfmac
[ 2124.929556] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 2124.929993] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 2125.105218] brcmfmac: power management disabled
[ 2125.150290] brcmfmac: power management disabled
[ 8237.434034] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content:
0x40012
[ 8239.890302] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8239.890822] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8239.890835] brcmfmac: brcmf_run_escan: error (-110)
[ 8239.890845] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
[ 8254.280425] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 8254.280438] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 8254.280491] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.280498] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 8254.800394] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.803873] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.808353] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.808370] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 8254.881402] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8254.881420] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 8255.001550] usbcore: deregistering interface driver brcmfmac
[ 8255.071184] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8255.077098] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8255.077348] usbcore: registered new interface driver brcmfmac
[ 8257.730418] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8257.751038] brcmfmac: brcmf_c_get_clm_name: retrieving revision info
failed (-110)
[ 8257.751049] brcmfmac: brcmf_c_process_clm_blob: get CLM blob file name
failed (-110)
[ 8257.751068] brcmfmac: brcmf_c_preinit_dcmds: download CLM blob file
failed, -110
[ 8257.751076] brcmfmac: brcmf_bus_started: failed: -110
[ 8257.751114] brcmfmac: brcmf_sdio_firmware_callback: dongle is not
responding
[ 8304.417684] usbcore: deregistering interface driver brcmfmac
[ 8304.486099] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8304.493613] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8304.494078] usbcore: registered new interface driver brcmfmac
[ 8304.686761] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 8304.687203] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 8304.829994] brcmfmac: power management disabled
[ 8304.907662] brcmfmac: power management disabled
[ 8357.441791] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content:
0x40012
[ 8359.891146] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8359.891655] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8359.891668] brcmfmac: brcmf_run_escan: error (-110)
[ 8359.891677] brcmfmac: brcmf_cfg80211_scan: scan error (-110)
[ 8371.731226] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8371.731731] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle
[ 8371.731746] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[ 8373.941267] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg
failed w/status -5
[ 8373.941280] brcmfmac: brcmf_cfg80211_get_tx_power: error (-5)
[ 8373.941330] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8373.941338] brcmfmac: brcmf_link_down: WLC_DISASSOC failed (-5)
[ 8374.461245] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.461942] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.463553] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.463573] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5)
[ 8374.564729] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing
to do.
[ 8374.564750] brcmfmac: brcmf_cfg80211_reg_notifier: Country code iovar
returned err = -5
[ 8374.702401] usbcore: deregistering interface driver brcmfmac
[ 8374.759839] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8374.767561] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8374.771137] usbcore: registered new interface driver brcmfmac
[ 8377.411255] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout
[ 8377.431924] brcmfmac: brcmf_c_get_clm_name: retrieving revision info
failed (-110)
[ 8377.431934] brcmfmac: brcmf_c_process_clm_blob: get CLM blob file name
failed (-110)
[ 8377.431941] brcmfmac: brcmf_c_preinit_dcmds: download CLM blob file
failed, -110
[ 8377.431949] brcmfmac: brcmf_bus_started: failed: -110
[ 8377.432003] brcmfmac: brcmf_sdio_firmware_callback: dongle is not
responding
[ 8424.133114] usbcore: deregistering interface driver brcmfmac
[ 8424.229631] brcmfmac: F1 signature read @0x18000000=0x15264345
[ 8424.237210] brcmfmac: brcmf_fw_map_chip_to_name: using
brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006
[ 8424.239352] usbcore: registered new interface driver brcmfmac
[ 8424.460736] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb
27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
[ 8424.461174] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2
Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09
18:56:28
[ 8424.646993] brcmfmac: power management disabled
[ 8424.708633] brcmfmac: power management disabled
digisaster commented 6 years ago

Same problem here.

If i do a: sudo BRANCH=next rpi-update Then i have a working situation. After this i got: uname -a Linux raspberrypi 4.14.17-v7+ #1090 SMP Mon Feb 5 21:02:18 GMT 2018 armv7l GNU/Linux

So i guess its kernel related.

JamesH65 commented 6 years ago

Interesting. So not in the current rpi-update, but in the next branch? Should make the particular change a bit easier to find.

ghost commented 6 years ago

any news?

JamesH65 commented 6 years ago

Not had a chance to look yet. How do you provoke the problem? We've not had many reports on B+ Wifi failing in this way, so apparently it's unusual. Have you updated to the very latest 4.14.xx kernel? Does that make any difference?

JamesH65 commented 6 years ago

Once I have more details on how to replicate the issues, I can send the data to Cypress for investigation. The mailbox error, IIRC, is a firmware crash, so its not something we can really deal with here, since we do not have access to the firmware.

llamasoft commented 6 years ago

It may be slightly off topic because it's a different distribution, but it's pretty easy to trigger this error when using Kali Pi.
Putting the device into monitor mode (using mon0up in Kali-Pi) and running aireplay-ng --test causes it to emit the Unknown mailbox data content: 0x40012 error almost immediately. From then on, the wifi is worthless until you reboot.

For reference, mon0up is a short shell script that runs iw phy phy0 interface add mon0 type monitor and ifconfig mon0 up and displays some brief info.

cdouglas97 commented 6 years ago

On a project I am working on, I am getting this constantly with Kali Pi when in monitor mode. Unloading and reloading the driver works sometimes and sometimes not. Sometimes it happens after 5 seconds, sometimes after 5 minutes, very random. Verified its not a hardware problem since I have 2 Pi's and both do it. I also loaded Raspbian latest and installed latest Nextmon drivers and I get the exact same thing, Kernel is 4.14.30-Re4son-v7+ on Kali. Don't have other offhand.

JamesH65 commented 6 years ago

Do you get any errors listed in dmesg?

cdouglas97 commented 6 years ago

Yes I do.

The following is while running the following command, The Set Channel failed aren’t really errors it seems since I am scanning a range

/usr/sbin/airodump-ng -C 2412-5825 --write-interval 10 --write test --output-format netxml wlan0mon

Working until this point…….. [78833.305809] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012 [78835.790416] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78835.797410] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78835.803824] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=53409, -110 [78838.590434] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78838.597330] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78838.603733] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=53413, -110 [78841.390457] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78841.397407] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78841.403867] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=4098, -110 [78844.190483] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78844.197466] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78844.204088] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=4102, -110 [78846.990502] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78846.997678] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78847.004601] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=4106, -110 [78849.790519] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78849.797885] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78849.804936] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=4110, -110 [78852.590543] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78852.597968] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78852.605103] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=53284, -110 [78855.390563] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78855.398051] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78855.405211] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=53288, -110 [78858.190583] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78858.198319] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [78858.205591] brcmfmac: brcmf_cfg80211_nexmon_set_channel: Set Channel failed: chspec=53292, -110 [78860.990609] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [78861.001145] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle …….

After shutting down airodump, I detected that it wasn’t capturing anything and attempted to unload at 79803.439351 and reload the driver at 79923.711128 and that failed. I have a 120sec timer between unload and reload of driver

[79782.517551] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [79782.529849] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [79782.541559] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [79785.157565] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [79785.169936] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [79785.181530] brcmfmac: _brcmf_set_multicast_list: Setting mcast_list failed, -110 [79787.717590] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [79787.729703] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [79792.837633] brcmfmac: _brcmf_set_multicast_list: Setting allmulti failed, -110 [79797.957685] brcmfmac: brcmf_cfg80211_del_ap_iface: interface_remove failed -110 [79800.517694] brcmfmac: _brcmf_set_multicast_list: Setting BRCMF_C_SET_PROMISC failed, -110 [79803.078132] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing to do. [79803.089706] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5) [79803.187727] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing to do. [79803.199528] brcmfmac: brcmf_fil_cmd_data: bus is down. we have nothing to do. [79803.210553] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-5) [79803.439351] usbcore: deregistering interface driver brcmfmac [79923.696268] brcmfmac: F1 signature read @0x18000000=0x15264345 [79923.700624] brcmfmac: brcmf_fw_map_chip_to_name: using brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006 [79923.711128] usbcore: registered new interface driver brcmfmac [79926.358647] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [79926.389146] brcmfmac: brcmf_c_get_clm_name: retrieving revision info failed (-110) [79926.400840] brcmfmac: brcmf_c_process_clm_blob: get CLM blob file name failed (-110) [79926.412661] brcmfmac: brcmf_c_preinit_dcmds: download CLM blob file failed, -110 [79926.423176] brcmfmac: brcmf_bus_started: failed: -110 [79926.431387] brcmfmac: brcmf_sdio_firmware_callback: dongle is not responding

… Tried unload and reload a little while later and it worked

[80685.169424] usbcore: deregistering interface driver brcmfmac [80805.455360] brcmfmac: F1 signature read @0x18000000=0x15264345 [80805.460216] brcmfmac: brcmf_fw_map_chip_to_name: using brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006 [80805.471390] usbcore: registered new interface driver brcmfmac [80805.805728] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Apr 10 2018 18:33:56 version 7.45.154 (nexmon.org: 2.2.2-178-gd64f-1) FWID 01-4fbe0b04 [80805.814840] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2 Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09 18:56:28

From: James Hughes [mailto:notifications@github.com] Sent: Wednesday, June 6, 2018 4:11 AM To: raspberrypi/linux linux@noreply.github.com Cc: Chris Douglas cdouglas@securustechnologies.com; Comment comment@noreply.github.com Subject: Re: [raspberrypi/linux] wlan freezes in raspberry pi 3B+ (#2453)

Do you get any errors listed in dmesg?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/raspberrypi/linux/issues/2453#issuecomment-395000585, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGrL91e9G5lSPwp1PtKvluye60-lz8K8ks5t55yhgaJpZM4SwkCN.

Click herehttps://www.mailcontrol.com/sr/2xffoH0VKMPGX2PQPOmvUsgYVF5ojvBykkW9UzADy!8LKeLB79zZKi4csdmxhkuDqLqNh6I9BzeTKOxjbsImtQ== to report this email as spam.

Nemesis7 commented 6 years ago

Any news about this? I'm having this same issue. From dmesg:

[ 4.584121] brcmfmac: brcmf_fw_map_chip_to_name: using brcm/brcmfmac43455-sdio.bin for chip 0x004345(17221) rev 0x000006 [ 4.868470] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Feb 27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04 [ 4.869048] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2 Data: 9.10.105 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-03-09 18:56:28

I wish I never executed the rpi-update; I finally got the Raspberry Pi 3 B+ working as an AP + Managed Wifi at 5GHz, but now the AP doesn't work anymore after the update. How can I downgrade the brcmfmac to a stable/working version?

gdb-power commented 6 years ago

After "Unknown mailbox data content: 0x40012" is received, sometimes the communication can be recovered with the following commands:

modprobe -r brcmfmac
modprobe brcmfmac

Sometimes it doesn't recover even after modprobe -r (device is stuck). In that case, the following heavy-handed commands will fix the communication with the wifi device:

echo -n "3f300000.mmc" > /sys/devices/platform/soc/3f300000.mmc/driver/unbind
sleep 1
echo -n "3f300000.mmc" > /sys/bus/platform/drivers/mmc-bcm2835/bind

technical comment: rebinding the mmc driver will call probe(), which will call mmc:bcm2835_reset_internal(), which will power-cycle the SDIO device (SDVDD_POWER_OFF), which will properly reset & re-detect the wedged WIFI SDIO device. Phew!

cdouglas97 commented 6 years ago

I’ll try it Monday and let you know. Thank you very much.

Sent from my iPhone

On Jun 10, 2018, at 11:45 AM, gdb-power notifications@github.com<mailto:notifications@github.com> wrote:

After "Unknown mailbox data content: 0x40012" is received, sometimes the communication can be recovered with the following commands:

modprobe -r brcmfmac modprobe brcmfmac

Sometimes it doesn't recover even after modprobe -r (device is stuck). In that case, the following heavy-handed commands will fix the communication with the wifi device:

echo -n "3f300000.mmc" > /sys/devices/platform/soc/3f300000.mmc/driver/unbind sleep 1 echo -n "3f300000.mmc" > /sys/bus/platform/drivers/mmc-bcm2835/bind

technical comment: rebinding the driver will call probe(), which will call bcm2835_reset_internal(), which will power-cycle the SDIO device (SDVDD_POWER_OFF), which will properly reset the wedged WIFI SDIO device. Phew!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/raspberrypi/linux/issues/2453#issuecomment-396063259, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGrL98NH4vq7R7-4fV5e33BpV9YdoCDTks5t7U0ZgaJpZM4SwkCN.

Click herehttps://www.mailcontrol.com/sr/FMgA5hvnUJvGX2PQPOmvUml+xXZX6IbqMNHcSExTXLGLfL5y4bkCVFXj2zyDaOvSKw2HpqN63pJ+U88fPeCTtg== to report this email as spam.

pelwell commented 6 years ago

@Nemesis7 You can revert to any previous firmware+kernel package using rpi-update by putting the hash (string of hexadecimal digits) on the command line. The hashes can be found on the right hand side of the list of commits (releases): https://github.com/Hexxeh/rpi-firmware/commits/master

Alternatively you could return to the standard Raspbian kernel using:

sudo apt-get install --reinstall raspberrypi-bootloader raspberrypi-kernel
cdouglas97 commented 6 years ago

The heavy handed method seems to be working to get the driver working again, but is there a chance of this being fixed so its not necessary? Is this a hardware or firmware issue? Thank you

JamesH65 commented 6 years ago

@cdouglas97 We currently do not know the cause of this. The mailbox error is the result of the firmware on the wireless chip dying, but the cause of that is unclear. We do NOT have access to the firmware source, that is provided as a binary by Cypress so we do rely on them to fix firmware issues.

I'm not looking at it at the moment, I have some other stuff to clear first. However, if anyone has a clear set of steps to replicate the problem on Raspbian, that would be very useful once I do start to look at it.

cdouglas97 commented 6 years ago

Thank you. I can reproduce it very easily. Steps:

wlan0 is not up, only active interface is eth0

Use Airmon-ng to create monitor port:

/usr/sbin/airmon-ng start wlan0

Run airodump that scans all 2.4 to 5.8Ghz frequencies

/usr/sbin/airodump-ng -C 2412-5825 --write-interval 10 --write OUTPUTME --output-format netxml wlan0mon

It usually happens within 30 seconds to 5 minutes. I run the airodump for 60 seconds at a time with a 15 min gap between runs and then kill it so sometimes it completes its run and sometimes not. I added the heavy-handed commands before I start and it didn't seem to make a difference on how long it took for it to die. It just fixed the firmware if it died during a previous run.

-Chris

JamesH65 commented 6 years ago

@cdouglas97 I am not familiar with airmon/airodump. What package do I need to install in Raspbian to get those?

pelwell commented 6 years ago

sudo apt-get install aircrack-ng

pelwell commented 6 years ago

With wlan0 available but not associated with an AP I get:

pi@raspberrypi:~ $ sudo airmon-ng start wlan0

Found 4 processes that could cause trouble.
If airodump-ng, aireplay-ng or airtun-ng stops working after
a short period of time, you may want to run 'airmon-ng check kill'

  PID Name
  319 avahi-daemon
  353 dhcpcd
  364 avahi-daemon
  400 wpa_supplicant

PHY Interface   Driver      Chipset

phy0    wlan0       brcmfmac    Broadcom 43430

ERROR adding monitor mode interface: command failed: Operation not supported (-95)

Any suggestions?

cdouglas97 commented 6 years ago

pelwell, the default Rasbian latest install firmware doesn't allow monitoring, have to use this: https://github.com/seemoo-lab/nexmon to make it work. Kali uses this and I took a Raspbian image a built patch as directed and both get same result.

JamesH65 commented 6 years ago

Er, so the way to make this go wrong is to install some random third party stuff that plays around with the firmware in a way that probably wasn't intended by the developers? Why am I not surprised this might go wrong?

Unless this goes wrong with our standard firmware, I'm not sure we should spend any more time on this.

cdouglas97 commented 6 years ago

I don't blame you, I thought that it was understood from the first post that said he was putting into promiscuous mode that's what we were doing. Unless was are supposed to be able to put default firmware into monitor mode making the nexmon unnecessary.

Thanks for your time and the command to force the wifi reload.

-Chris

pelwell commented 6 years ago

I don't know why you would think it should have been understood - promiscuous mode is not the same as monitor mode.

cdouglas97 commented 6 years ago

My mistake, I apologize.

From: Phil Elwell [mailto:notifications@github.com] Sent: Tuesday, June 12, 2018 6:23 AM To: raspberrypi/linux linux@noreply.github.com Cc: Chris Douglas cdouglas@securustechnologies.com; Mention mention@noreply.github.com Subject: Re: [raspberrypi/linux] wlan freezes in raspberry pi 3B+ (#2453)

I don't know why you would think it should have been understood - promiscuous mode is not the same as monitor mode.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/raspberrypi/linux/issues/2453#issuecomment-396555679, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGrL91R_VQ9jwYOM1WkEl-icoBhLME9Eks5t76SfgaJpZM4SwkCN.

Click herehttps://www.mailcontrol.com/sr/E8xEJgcIKGjGX2PQPOmvUp0m8S!KNwoj3Ra0zhZJ8qGVaGA06AtFwxmfWX1l6vtTJu!Oqka8PlqNc!rJiVGaJg== to report this email as spam.

cdouglas97 commented 6 years ago

The OP might not be doing the same thing as I am with the nexmon fw since he is just trying promiscuous mode so it still might be something you can investigate for him.

gdb-power commented 6 years ago

@pelwell @JamesH65 please don't give up on all of us who use the regular firmware... It is a real problem.

In the following conditions, the firmware crashes several times per day:

  1. Raspberry PI running in an environment with alot a diverse wifi packets, such as a busy train station or a busy mall. Not a "regular" office environment or a shielded RF room...

  2. Raspberry PI running as hotspot (thus continuously listening to incoming packets), preferably on a busy 2.4GHz channel (1,6,11), not on a clean 5GHz channel. There is no need for stations to be connected to the hotspot.

Note: the raspberry PI wifi crashes randomly in any environment and also when running just as station (not AP). I'm just describing the conditions that increase the probability of crashing.

pelwell commented 6 years ago

Nobody's giving up. We have a number of difficult problems ongoing, and the Ethernet stalls are currently getting most of the attention, but now we have what should be a low-impact workaround for that the spotlight will turn to this issue (which looks suspiciously like an old Pi3B problem).

gdb-power commented 6 years ago

Another tip: when the firmware crashes, you can collect internal firmware stack traces here (you may need to compile the driver with DEBUG enabled):

cat /sys/kernel/debug/brcmfmac/mmc1:0001:1/forensics

In a busy train station/mall/office, when there are alot of people, you can sometimes catch several crashes per hour.

In an ideal world, broadcom/cypress would release the firmware source code so the community would be able to fix it, just like qualcomm released their internal wifi firmware. There are no big wifi secrets, all wifi firmware is pretty trivial:

https://github.com/qca/open-ath9k-htc-firmware

llamasoft commented 6 years ago

@gdb-power - Your comment gives me an idea. If anybody has a second device with a monitor enabled WiFi adapter, you may be able to capture the series of frames that causes the firmware to lock up. The experiment would be something like:

  1. Set up the monitoring device running airodump-ng to capture all frames on the Raspberry Pi's channel.
  2. Wait for the Raspberry Pi's WiFi to lock up, then terminate the capture and reboot the Raspberry Pi.
  3. Bisect the captured frames and replay them with aireplay-ng until the triggering frames are isolated.

Of course, this assumes the hang is caused by incoming frames. If it's caused by invalid outgoing frames... I'm not sure how to capture that.

gdb-power commented 6 years ago

Fuzzing the cypress firmware for crashes is pretty trivial, several groups have done it in the past (including Google project zero). In order to fix the firmware, Broadcom/cypress should be doing the fuzzing, since they're the only one with access to the source code.

pelwell commented 6 years ago

@gdb-power If you have any forensics dumps then we'd like to see them.

@llamasoft That sounds like a plan. The issue is likely to be triggered by packet reception - transmission is massively simpler because the driver gets to determine the timing and packet content.

cdouglas97 commented 6 years ago

I know I am out of this since I am using a patched driver to support monitoring, but this might help. All I am doing is running airodump-ng to scan for access points and it crashes. There are 42 in my vicinity that I usually detect.

From: Phil Elwell [mailto:notifications@github.com] Sent: Wednesday, June 13, 2018 10:51 AM To: raspberrypi/linux linux@noreply.github.com Cc: Chris Douglas cdouglas@securustechnologies.com; Mention mention@noreply.github.com Subject: Re: [raspberrypi/linux] wlan freezes in raspberry pi 3B+ (#2453)

@gdb-powerhttps://github.com/gdb-power If you have any forensics dumps then we'd like to see them.

@llamasofthttps://github.com/llamasoft That sounds like a plan. The issue is likely to be triggered by packet reception - transmission is massively simpler because the driver gets to determine the timing and packet content.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/raspberrypi/linux/issues/2453#issuecomment-396987984, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGrL99LoF6Bv4qWnN1THMFoX9W2br_4rks5t8TTngaJpZM4SwkCN.

Click herehttps://www.mailcontrol.com/sr/cqbIKFadkpbGX2PQPOmvUrwS+HatjSAfrPqB136TcZeWLAsnK4zsblWQviG7V1!ALAytrlG2tsMeq4Nrivo6iw== to report this email as spam.

JamesH65 commented 6 years ago

As with all of these ethernet issues, replication is the crux. We need simple (ish) ways of making the problem happen, preferably in a situation where we can have debuggers and diagnostics tools attached (which makes a trip out of the office a real issue). The recent 3B+ issue suddenly got easier to solve when we had a user who could replicate at will and was able to help, and when I finally managed to be able to cause it on demand. So if ANYONE has a guaranteed way of causing this mailbox error that would be very useful.

On the 3B this mailbox error has been around since launch, but I thought it had been fixed with the most recent firmware upgrade. Certainly until this thread it hadn't been recently reported.

pelwell commented 6 years ago

Yes, it's possible that the 43438 fix will also apply to the 43455.

cdouglas97 commented 6 years ago

I’ll help but I don’t know if I am a valid test since I am using patched firmware. I replicate it every 15 minutes like clockwork.

From: James Hughes [mailto:notifications@github.com] Sent: Wednesday, June 13, 2018 2:52 PM To: raspberrypi/linux linux@noreply.github.com Cc: Chris Douglas cdouglas@securustechnologies.com; Mention mention@noreply.github.com Subject: Re: [raspberrypi/linux] wlan freezes in raspberry pi 3B+ (#2453)

As with all of these ethernet issues, replication is the crux. We need simple (ish) ways of making the problem happen, preferably in a situation where we can have debuggers and diagnostics tools attached (which makes a trip out of the office a real issue). The recent 3B+ issue suddenly got easier to solve when we had a user who could replicate at will and was able to help, and when I finally managed to be able to cause it on demand. So if ANYONE has a guaranteed way of causing this mailbox error that would be very useful.

On the 3B this mailbox error has been around since launch, but I thought it had been fixed with the most recent firmware upgrade. Certainly until this thread it hadn't been recently reported. Cypress actually closed the case, but it looks like it will need to be reopened, once we can replicate it reliably.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/raspberrypi/linux/issues/2453#issuecomment-397063711, or mute the threadhttps://github.com/notifications/unsubscribe-auth/AGrL92UKBe7_WjY4vwEv_tqsK_ZPKLwvks5t8W1fgaJpZM4SwkCN.

Click herehttps://www.mailcontrol.com/sr/R8aiI74c9ILGX2PQPOmvUrwS+HatjSAf5NNxnNyzRjRwsYc14gZotX5KFDHZEpQZLAytrlG2tsPNfN+NdwfdFw== to report this email as spam.

Nemesis7 commented 6 years ago

@JamesH65 I am totally new to this but I hope my input can help you solve this issue (even without fixing the firmware). I am encountering this problem when I try to setup the integrated WiFi as a (managed) client and as an AP (incl. ip-forwarding) on a Raspberry Pi 3 B+. The first time around I followed this guide, which seemed to work: https://blog.thewalr.us/2017/09/26/raspberry-pi-zero-w-simultaneous-ap-and-managed-mode-wifi/, eventually it failed, I think it's because I didn't disable dhcpcd which didn't play nice with /etc/network/interfaces, but I want a better solution without the deprecated /etc/network/interfaces and without cronjobs.

Now I've done what is described here: https://www.raspberrypi.org/forums/viewtopic.php?f=36&t=138730&start=125#p1321390. This works pretty neat until the AP is activated, then I lose the WiFi connection and I get the mailbox error preceded by: brcmfmac brcmf_link_down wlc_disassoc failed (-11)

This fails every time, no exception. Is there a firmware/kernel version I can try to see if it works there?

gdb-power commented 6 years ago

@pelwell @JamesH65 I sent you forensics dump by email.

pelwell commented 6 years ago

Yes, thank you. We are sending them direct to Cypress who may then have enough information to fix the crashes.

gdb-power commented 6 years ago

Also, it might be a good idea to enable brcmfmac DEBUG flag in the raspberry pi default kernel, so that anyone can send forensic dumps.

JamesH65 commented 6 years ago

The Cypress case is live, and they have assigned 'collaborators' to it, so hopefully we will hear something soon.

Nemesis7 commented 6 years ago

Guys, for me the crash stopped happening when I changed the order in which I put interfaces live: the wlan0 must be disabled, then enable the ap0, then enabled wlan0. In another sequence, it will crash.

brubbel commented 6 years ago

@Nemesis7. Yes, but when wlan0 associates with an AP (e.g. using wpa_supplicant), traffic over ap0 (e.g. using hostapd) becomes slow/unreliable after some time (clients stay connected though). This does not happen as long as wlan0 is not used. I have the strong impression that separation between wlan0 and the virtual interface ap0 is really messed up, since wpa_supplicant spits out messages like ignored event (cmd=19) for foreign interface (ifindex 6 wdev 0x0), and /sys/class/net/ap0/ifindex shows that it has ifindex 6, so what are these events doing on the wlan0 interface? As soon as the station at wlan0 disconnects, things 'seem' to be running fine again on ap0.

Edit: When the mac address of wlan0 is changed, ap0 stops working. Tcpdump shows incoming data on ap0, but it looks like encrypted data. When the MAC of wlan0 is changed to original again, ap0 starts working and tcpdump shows nicely formatted ping packets. Imho it really points to a serious problem with the brcmfmac43430-sdio firmware.

Edit2: Even when ap0 is producing malformed(?) packets, and 'ap_isolate=1' is in hostapd.conf, it seems that clients associated to ap0 can still ping each other, but not the ap0 interface or any other ip-address beyond.

So, in utter confusion, I changed the iptables policy to 'iptables -P INPUT DROP', so I can't even ping the localhost from within a terminal on the raspberry, and strangely enough: clients on ap0 can still happily connect to each other.

So: is it an accepted security policy for this driver --which I suppose is not only used on rpi-- to ignore any rules (L2 or L3) imposed by the linux kernel?

JamesH65 commented 6 years ago

OK, we have some beta software from Cypress, that may or may not help with this. The files on the following two links need to be copied to the /lib/firmware/brcm folder on the Pi. Note, there do appear to be some error messages reported when the driver first starts up, but doesn't seem to affect usage, however, I recommend backing up the two original files first. Currently talking with Cypress re: those error messages.

https://drive.google.com/file/d/1bqugahKmfz1uQe8u5VHijAUnuZTxGkvG/view?usp=sharing https://drive.google.com/file/d/1mbfEOMShLrul-qprmlcPSuERdCTdh4e7/view?usp=sharing

brubbel commented 5 years ago

No success: driver fails to load. image

Steps undertaken:

  1. brcmfmac43455 is for 3B+ only (CYW43455): /proc/cpuinfo says rev=a020d3

  2. sudo dd if=2018-06-27-raspbian-stretch-lite.img of=/dev/sdb bs=4M

  3. cp brcmfmac43455-sdio.* /lib/firmware/brcm/

  4. sha256sum /lib/firmware/brcm/brcmfmac43455-sdio.*

    644b1afe735232a1b0c447e6f80650a9992f6977b80dc1d468c7302c769aa5d5  /lib/firmware/brcm/brcmfmac43455-sdio.bin
    635bdcbf9dc2cf7dd3bb72480566f347966e95f3deb2fdb5615a4001c7dd2e77  /lib/firmware/brcm/brcmfmac43455-sdio.clm_blob
    15698c62457bcf25e60d063e6c666d6e1b7dacdf2b03e6d14ebbc619de6da6b7  /lib/firmware/brcm/brcmfmac43455-sdio.txt
  5. halt + powercycle: new driver fails

  6. sh -c "echo options brcmfmac debug=0x100000 > /etc/modprobe.d/brcmfmac.conf"

  7. apt update; apt upgrade + halt + powercycle: new driver fails

  8. rpi-update + halt + powercycle: new driver fails

  9. modprobe -r brcmfmac

  10. echo -n "3f300000.mmc" > /sys/devices/platform/soc/3f300000.mmc/driver/unbind

  11. echo -n "3f300000.mmc" > /sys/bus/platform/drivers/mmc-bcm2835/bind

  12. modprobe brcmfmac (fails + stack trace)

  13. copy original driver to /lib/firmware/brcm + powercycle: old driver ok.

  14. repeat steps 9-12: old driver ok.

gdb-power commented 5 years ago

Same here - driver crashes and burns. No wifi interface.

[    3.987473] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Jun 20 2018 20:26:28 version 7.45.165 (r692055 CY) FWID 01-1de59a68
[    3.988042] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2 Data: 9.10.116 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-06-20 20:12:36 
[    4.965242] uart-pl011 3f201000.serial: no DMA platform data
[    7.751431] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[    7.751442] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[   10.311449] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   10.311462] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[   11.465157] Bluetooth: Core ver 2.22
[   11.465238] NET: Registered protocol family 31
[   11.465244] Bluetooth: HCI device and connection manager initialized
[   11.465263] Bluetooth: HCI socket layer initialized
[   11.465277] Bluetooth: L2CAP socket layer initialized
[   11.465311] Bluetooth: SCO socket layer initialized
[   11.479666] Bluetooth: HCI UART driver ver 2.3
[   11.479681] Bluetooth: HCI UART protocol H4 registered
[   11.479687] Bluetooth: HCI UART protocol Three-wire (H5) registered
[   11.479884] Bluetooth: HCI UART protocol Broadcom registered
[   11.655598] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   11.655606] Bluetooth: BNEP filters: protocol multicast
[   11.655623] Bluetooth: BNEP socket layer initialized
[   12.871437] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   12.871450] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[   15.431426] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   15.431437] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[   17.991448] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   23.111431] brcmfmac: brcmf_dongle_scantime: Scan assoc time error (-110)
[   25.671431] brcmfmac: brcmf_netdev_open: failed to bring up cfg80211
[   28.231434] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   28.231446] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110)
[   30.791428] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   30.791440] brcmfmac: brcmf_cfg80211_get_tx_power: error (-110)
[   33.351436] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[   38.471439] brcmfmac: brcmf_dongle_scantime: Scan assoc time error (-110)
[   41.031441] brcmfmac: brcmf_netdev_open: failed to bring up cfg80211
JamesH65 commented 5 years ago

Hmm, although I get those messages, the WiFi interface does come up. I'll report back to cypress.

On Sun, 15 Jul 2018, 18:22 gdb-power, notifications@github.com wrote:

Same here - driver crashes and burns. No wifi interface.

[ 3.987473] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: Jun 20 2018 20:26:28 version 7.45.165 (r692055 CY) FWID 01-1de59a68 [ 3.988042] brcmfmac: brcmf_c_preinit_dcmds: CLM version = API: 12.2 Data: 9.10.116 Compiler: 1.29.4 ClmImport: 1.36.3 Creation: 2018-06-20 20:12:36 [ 4.965242] uart-pl011 3f201000.serial: no DMA platform data [ 7.751431] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 7.751442] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [ 10.311449] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 10.311462] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [ 11.465157] Bluetooth: Core ver 2.22 [ 11.465238] NET: Registered protocol family 31 [ 11.465244] Bluetooth: HCI device and connection manager initialized [ 11.465263] Bluetooth: HCI socket layer initialized [ 11.465277] Bluetooth: L2CAP socket layer initialized [ 11.465311] Bluetooth: SCO socket layer initialized [ 11.479666] Bluetooth: HCI UART driver ver 2.3 [ 11.479681] Bluetooth: HCI UART protocol H4 registered [ 11.479687] Bluetooth: HCI UART protocol Three-wire (H5) registered [ 11.479884] Bluetooth: HCI UART protocol Broadcom registered [ 11.655598] Bluetooth: BNEP (Ethernet Emulation) ver 1.3 [ 11.655606] Bluetooth: BNEP filters: protocol multicast [ 11.655623] Bluetooth: BNEP socket layer initialized [ 12.871437] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 12.871450] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [ 15.431426] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 15.431437] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [ 17.991448] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 23.111431] brcmfmac: brcmf_dongle_scantime: Scan assoc time error (-110) [ 25.671431] brcmfmac: brcmf_netdev_open: failed to bring up cfg80211 [ 28.231434] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 28.231446] brcmfmac: brcmf_cfg80211_get_channel: chanspec failed (-110) [ 30.791428] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 30.791440] brcmfmac: brcmf_cfg80211_get_tx_power: error (-110) [ 33.351436] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 [ 38.471439] brcmfmac: brcmf_dongle_scantime: Scan assoc time error (-110) [ 41.031441] brcmfmac: brcmf_netdev_open: failed to bring up cfg80211

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/2453#issuecomment-405105285, or mute the thread https://github.com/notifications/unsubscribe-auth/ADqrHXmSuvqrWUTlVyCHJNKoR4xKsvhWks5uG3pCgaJpZM4SwkCN .

brubbel commented 5 years ago

As a side-note, Firmware version 7.45.154 (Feb 27 2018) has a discrepancy between the powerstate reported by the driver and the powerstate reported by iwconfig. I suppose that it is somewhat related to this commit which disabled powersave, but it is confusing. Anyway, I understand it is always off.

$ iw wlan0 set power_save on dmesg says: brcmfmac: power management disabled iwconfig says: Power Management:on

$ iw wlan0 set power_save off dmesg says: brcmfmac: power management disabled iwconfig says: Power Management:off

brubbel commented 5 years ago

So if ANYONE has a guaranteed way of causing this mailbox error that would be very useful.

  1. Start from the setup in my post above: 2018-06-27-raspbian-stretch-lite and rpi 3B+ rev=a020d3 _brcmfmac: brcmf_c_preinitdcmds: Firmware version = wl0: Feb 27 2018 03:15:32 version 7.45.154 (r684107 CY) FWID 01-4fbe0b04
  2. sh -c "echo options brcmfmac debug=0x100000 > /etc/modprobe.d/brcmfmac.conf"
  3. remove /lib/dhcpcd/dhcpcd-hooks/10-wpa-supplicant to prevent wpa_supplicant from starting + reboot
  4. apt install hostapd
  5. $ iw dev wlan0 interface add uap0 type __ap
  6. terminal 1: $ wpa_supplicant -dd -iwlan0 -c /home/pi/wpa_supplicant.conf wait until connected to your wifi network
  7. terminal 2: $ hostapd /home/pi/hostapd.conf
  8. terminal 3: $ dmesg -ew

[ 79.680414] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012 [ 82.154516] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [ 82.155047] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [ 84.714914] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [ 84.715403] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle [ 84.715417] brcmfmac: brcmf_c_set_joinpref_default: Set join_pref error (-110) [ 87.154353] brcmfmac: brcmf_cfg80211_connect: BRCMF_C_SET_SSID failed (-110) [ 89.675516] brcmfmac: brcmf_sdio_bus_rxctl: resumed on timeout [ 89.676054] brcmfmac: brcmf_sdio_checkdied: firmware trap in dongle

I believe the crash is related to the sudden channel switch imposed by hostapd. Omitting the channel=6 setting makes hostapd to do an ACS survey, but this is not supported by the firmware it seems. The driver doesn't crash, but hostapd does not start either. In any case, the driver should decide what to do: ignoring the hostapd channel or leaving the STA channel. There's no middle road. Fixing this issue may also resolve the intermittent crashes every few hours or days of the STA-only mode. It could be that a scheduled AP scan by wpa_supplicant causes the same (race) condition in the driver, although far less frequent.

The driver can be resurrected by the unbind/bind procedure, however I'd like it to stay alive. For now, a watchdog on the /sys/class/net/wlan0/operstate is our only fallback. The problem is that driver condition (e.g. wifi speed) can also deteriorate without crashing. Alas, there is no /sys/kernel/debug/brcmfmac/mmc1\:0001\:1/healthcondition metric.

Edit: yes there is
cat /sys/kernel/debug/brcmfmac/mmc1\:0001\:1/counters
tx_ctlerrs:   25
rx_ctlerrs:   19

/home/pi/wpa_supplicant.conf:

ap_scan=1

ctrl_interface=/var/run/wpa_supplicant
network={
    ssid="E2000"
    scan_ssid=1
    proto=WPA RSN
    key_mgmt=WPA-PSK
    pairwise=CCMP TKIP
    group=CCMP TKIP
    psk="myrouterpassword"
}

/home/pi/hostapd.conf:

interface=uap0
ssid=raspberrypi
hw_mode=g
channel=6
wmm_enabled=0
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wpa=2
wpa_passphrase=myhostapdpassword
wpa_key_mgmt=WPA-PSK
rsn_pairwise=CCMP
wpa_pairwise=TKIP
ap_isolate=1

cat /sys/kernel/debug/brcmfmac/mmc1\:0001\:1/forensics

dongle trap info: type 0x4 @ epc 0x00021e2c cpsr 0x8000019f spsr 0x800001bf sp 0x0025fcb8 lr 0x00020447 pc 0x00021e2c offset 0x25fc60 r0 0x002475b8 r1 0x00000000 r2 0x00000000 r3 0x00000000 r4 0x00259e30 r5 0x002412f4 r6 0x002475b8 r7 0x00000004 0x0 000310.996 cca_stats_watchdog: Bad chanspec!! 000310.996 wl0: cca_stats watchdog handler error 000311.115 wl0: wlc_iovar_op: wpaie BCME -7 (Not STA) 000311.119 FWID 01-4fbe0b04 flags 1 000311.119 TRAP 4(25fc60): pc 21e2c, lr 20447, sp 25fcb8, cpsr 8000019f, spsr 800001bf 000311.119 dfsr 80d, dfar e0 000311.119 r0 2475b8, r1 0, r2 0, r3 0, r4 259e30, r5 2412f4, r6 2475b8 000311.119 r7 4, r8 cd, r9 25aa64, r10 2471b4, r11 245c48, r12 204dc 000311.119 sp+0 00240b3d 00000004 00000001 00245c48 000311.119 sp+10 0023d546 0023d5c0 00010030 00000000

000311.119 sp+48 00001c01 000311.119 sp+54 0008b59b 000311.119 sp+74 001a60d9 000311.119 sp+94 0019adfd 000311.119 sp+ac 001a63b9 000311.119 sp+cc 0003fc21 000311.119 sp+fc 00020361 000311.119 sp+178 000203b9 000311.119 sp+18c 00025a97 000311.119 sp+1b4 0001f361 000311.119 sp+1d0 00000107 000311.119 sp+1d4 0001f231 000311.119 sp+1f4 00006583 000311.119 sp+214 0019c541 000311.119 sp+27c 0019b9bb

JamesH65 commented 5 years ago

I've passed on the details of the issues with this test firmware to Cypress. Interstingly, although this is new and in test from our point of view, I believe it's actually their top of tree build, so I would not be expecting issues this clear to be turning up. Most strange. Anyway, we now wait for Cypress.

brubbel commented 5 years ago

Further testing:

  1. start wpa_supplicant, connect to wifi.
  2. Suspend wpa_supplicant ($ ctrl-z)
  3. Let hostapd (on uap0) run until NL80211_SET_BEACON is transferred to the nl80211 driver.
  4. brcmfmac43455 driver is still ok
  5. continue wpa_supplicant ($ fg). Wait until it starts scanning.
  6. forensics: wl0: wlc_iovar_op: wpaie BCME -7 (Not STA)
  7. crash -> dongle trap info: type 0x4 @ epc 0x00021e2c

If wpa_supplicant is stopped at (5) and restarted, driver continues normally (until it decides to crash later of course).

So it is not a race condition, but an fsm state (isolation) problem. Starting beacons on uap0 results in wlan0 assuming it is in AP mode.