raspberrypi / linux

Kernel source tree for Raspberry Pi-provided kernel builds. Issues unrelated to the linux kernel should be posted on the community forum at https://forums.raspberrypi.com/
Other
11.2k stars 5.02k forks source link

wlan freezes in raspberry pi 3/PiZeroW (Not 3B+) #1342

Open dh-connect opened 8 years ago

dh-connect commented 8 years ago

I put the same sd card (running debian 8 jessie, kernel 4.1.19) from the raspberry pi 2 with usb wifi (EDIMAX EW-7811UN Wireless USB Adapter, 150 Mbit/s, IEEE802.11b/g/n) into the new raspberry pi 3 using integrated wlan. Since then the wlan freezes after while (several hours) of usage couldn't find out if it's due to havy wifi usage or not, because I haven't change the software I guess it has to do with the new hardware. When the wlan freezes the pi can't be reached any longer, neither ifdown + ifup nor restart networking service helps in this case, I have to reboot the system to get it back to work, syslog doesn't say much only this: dhcpcd[522]: wlan0: fe80::8af7:c7ff:fece:5912: expired option 25,

I've tried to change these settings so far, but without improvement:

sudo nano /etc/network/interfaces wireless-power off

sudo nano /etc/sysctl.conf at the end of the file add the following line vm.min_free_kbytes = 16384

sudo nano /boot/cmdline.txt At the end of the line, add: smsc95xx.turbo_mode=N dwc_otg.dma_enable=1 dwc_otg.dma_burst_size=256

pelwell commented 8 years ago

Thanks for your help so far,

jrmhaig commented 8 years ago

I've just found this issue so can I confirm that it is similar to what I am seeing? I have set up a RPi 3 as an access point and every so often I am unable to connect to it. I am able to ssh in over the wired connection and I see that wlan0 is still up with the correct IP address but the only way to get the access point working again is to reboot. I see stack traces like this in /var/log/messages

Jul 16 06:57:18 raspberrypi kernel: [117621.171957] ------------[ cut here ]------------
Jul 16 06:57:18 raspberrypi kernel: [117621.172042] WARNING: CPU: 2 PID: 879 at drivers/net/wireless/brcm80211/brcmfmac/core.c:1191 brcmf_netdev_wait_pend8021x+0xe4/0xf0 [brcmfmac]()
Jul 16 06:57:18 raspberrypi kernel: [117621.172052] Modules linked in: ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables bnep hci_uart btbcm bluetooth brcmfmac brcmutil cfg80211 rfkill snd_bcm2835 snd_pcm snd_timer snd bcm2835_gpiomem bcm2835_wdt uio_pdrv_genirq uio ipv6
Jul 16 06:57:18 raspberrypi kernel: [117621.172168] CPU: 2 PID: 879 Comm: hostapd Tainted: G        W       4.4.11-v7+ #888
Jul 16 06:57:18 raspberrypi kernel: [117621.172177] Hardware name: BCM2709
Jul 16 06:57:18 raspberrypi kernel: [117621.172212] [<80018724>] (unwind_backtrace) from [<80014058>] (show_stack+0x20/0x24)
Jul 16 06:57:18 raspberrypi kernel: [117621.172235] [<80014058>] (show_stack) from [<803205a4>] (dump_stack+0xd4/0x118)
Jul 16 06:57:18 raspberrypi kernel: [117621.172259] [<803205a4>] (dump_stack) from [<80025300>] (warn_slowpath_common+0x98/0xc8)
Jul 16 06:57:18 raspberrypi kernel: [117621.172282] [<80025300>] (warn_slowpath_common) from [<800253ec>] (warn_slowpath_null+0x2c/0x34)
Jul 16 06:57:18 raspberrypi kernel: [117621.172350] [<800253ec>] (warn_slowpath_null) from [<7f23a1d4>] (brcmf_netdev_wait_pend8021x+0xe4/0xf0 [brcmfmac])
Jul 16 06:57:18 raspberrypi kernel: [117621.172466] [<7f23a1d4>] (brcmf_netdev_wait_pend8021x [brcmfmac]) from [<7f228fbc>] (send_key_to_dongle+0xa4/0xf8 [brcmfmac])
Jul 16 06:57:18 raspberrypi kernel: [117621.172579] [<7f228fbc>] (send_key_to_dongle [brcmfmac]) from [<7f229208>] (brcmf_cfg80211_del_key+0x68/0x78 [brcmfmac])
Jul 16 06:57:18 raspberrypi kernel: [117621.172723] [<7f229208>] (brcmf_cfg80211_del_key [brcmfmac]) from [<7f1742f0>] (nl80211_del_key+0xfc/0x28c [cfg80211])
Jul 16 06:57:18 raspberrypi kernel: [117621.172817] [<7f1742f0>] (nl80211_del_key [cfg80211]) from [<80505e00>] (genl_rcv_msg+0x26c/0x3f0)
Jul 16 06:57:18 raspberrypi kernel: [117621.172841] [<80505e00>] (genl_rcv_msg) from [<80504fd8>] (netlink_rcv_skb+0xb0/0xcc)
Jul 16 06:57:18 raspberrypi kernel: [117621.172862] [<80504fd8>] (netlink_rcv_skb) from [<80505b84>] (genl_rcv+0x34/0x44)
Jul 16 06:57:18 raspberrypi kernel: [117621.172883] [<80505b84>] (genl_rcv) from [<80504914>] (netlink_unicast+0x190/0x254)
Jul 16 06:57:18 raspberrypi kernel: [117621.172904] [<80504914>] (netlink_unicast) from [<80504de0>] (netlink_sendmsg+0x340/0x354)
Jul 16 06:57:18 raspberrypi kernel: [117621.172926] [<80504de0>] (netlink_sendmsg) from [<804b7c14>] (sock_sendmsg+0x24/0x34)
Jul 16 06:57:18 raspberrypi kernel: [117621.172947] [<804b7c14>] (sock_sendmsg) from [<804b82fc>] (___sys_sendmsg+0x1e0/0x1e8)
Jul 16 06:57:18 raspberrypi kernel: [117621.172968] [<804b82fc>] (___sys_sendmsg) from [<804b9054>] (__sys_sendmsg+0x4c/0x7c)
Jul 16 06:57:18 raspberrypi kernel: [117621.172988] [<804b9054>] (__sys_sendmsg) from [<804b909c>] (SyS_sendmsg+0x18/0x1c)
Jul 16 06:57:18 raspberrypi kernel: [117621.173008] [<804b909c>] (SyS_sendmsg) from [<8000fb40>] (ret_fast_syscall+0x0/0x1c)
Jul 16 06:57:18 raspberrypi kernel: [117621.173019] ---[ end trace 2d66bc66d6534ca4 ]---

My kernel is 4.4.13-v7+ and I have just run rpi-update for the first time so I don't know yet if that will help.

Hecatron commented 8 years ago

I wonder if this might be related, or perhaps a separate issue https://www.youtube.com/watch?v=_D_fi_ck9Vo

mmmmlab commented 8 years ago

My RPI3 worked without any problems via WiFi until I upgraded it to latest udev ...

Now, it doesn't connect anymore ...

I've also installed patched modules from Pelwell but we no success: simply it doesn't connect ...

Let me know if I can help,

My best, Mimmo

Ruffio commented 8 years ago

@dh-connect has your issue been resolved? If so, please close this issue. Thanks.

dh-connect commented 8 years ago

I'm working with lan since, haven't tried wlan

motocodeltd commented 8 years ago

Hi,

I've got what seems to be the same issue with my rpi 3. I've reverted to using the official RPI wifi usb dongle which is rock solid, but the built in wifi dies after ~20 hours of connectivity with these kind of messages in syslog

brcmfmac: brcmf_cfg80211_reg_notifier: not a ISO3166 code cfg80211: World regulatory domain updated: cfg80211: DFS Master region: unset

this is on latest raspbian, latest firmware

mathieugouin commented 8 years ago

Is it possible to re-open this issue? Why it was closed?

I'm working with lan since, haven't tried wlan dh-connect closed this 13 days ago

This is not a solution worth closing the issue...

I still have the issue and can reproduce the bug.

My relevant portion of dmesg is:

[174174.396705] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012
[174215.037175] brcmfmac: _brcmf_set_multicast_list: Setting mcast_list failed, -52
[174217.037166] brcmfmac: _brcmf_set_multicast_list: Setting allmulti failed, -52
[174219.037171] brcmfmac: _brcmf_set_multicast_list: Setting BRCMF_C_SET_PROMISC failed, -52
ghost commented 8 years ago

I'm running into the same problem as @jrmhaig and upgraded now have

$ dpkg-query -s firmware-brcm80211
Package: firmware-brcm80211
Status: install ok installed
Priority: optional
Section: non-free/kernel
Installed-Size: 4296
Maintainer: Debian Kernel Team <debian-kernel@lists.debian.org>
Architecture: all
Multi-Arch: foreign
Source: firmware-nonfree
Version: 0.43+rpi5
Suggests: initramfs-tools
Description: Binary firmware for Broadcom 802.11 wireless cards
 This package contains the binary firmware for wireless network cards with
 the Broadcom BCM4313, BCM43224, BCM43225, BCM43241, BCM43143, BCM4329,
 BCM4330, BCM4334, BCM4335 or BCM43430 chips, supported by the brcmsmac or
 brcmfmac driver.
 .
 Contents:
  * Broadcom 802.11 firmware, version 610.812 (brcm/bcm43xx-0.fw)
  * Broadcom 802.11 firmware header, version 610.812
    (brcm/bcm43xx_hdr-0.fw)
  * Broadcom BCM43143 firmware (brcm/brcmfmac43143-sdio.bin)
  * Broadcom BCM43241 rev 0-3 firmware (brcm/brcmfmac43241b0-sdio.bin)
  * Broadcom BCM43241 rev 4+ firmware (brcm/brcmfmac43241b4-sdio.bin)
  * Broadcom BCM4329 firmware (brcm/brcmfmac4329-sdio.bin)
  * Broadcom BCM4330 firmware (brcm/brcmfmac4330-sdio.bin)
  * Broadcom BCM4334 firmware (brcm/brcmfmac4334-sdio.bin)
  * Broadcom BCM4335 firmware (brcm/brcmfmac4335-sdio.bin)
  * Broadcom BCM43362 firmware (brcm/brcmfmac43362-sdio.bin)
  * Broadcom BCM4354 firmware (brcm/brcmfmac4354-sdio.bin)
  * Broadcom BCM43143 firmware (brcm/brcmfmac43143.bin)
  * Broadcom BCM43430 firmware (brcm/brcmfmac43430-sdio.bin)
  * NVRAM file for BCM943430 (brcm/brcmfmac43430-sdio.txt)
Homepage: http://git.kernel.org/?p=linux/kernel/git/firmware/linux-firmware.git

Setup hostapd with a bridge.

/etc/hostapd/hostapd.conf

ctrl_interface=/var/run/hostapd
###############################
# Basic Config
###############################
macaddr_acl=0 auth_algs=1
# Most modern wireless drivers in the kernel need driver=nl80211
driver=nl80211

#####
# Logging
#####
logger_syslog_level=0

##########################
# Local configuration...
##########################
interface=wlan0
bridge=br0
hw_mode=g
ieee80211n=1
channel=1
ssid=WillCrashOnYou
macaddr_acl=0
auth_algs=1
ignore_broadcast_ssid=0
wpa=3
wpa_passphrase=JustYouWait:)
wpa_key_mgmt=WPA-PSK
wpa_pairwise=TKIP
rsn_pairwise=CCMP

/etc/network/interfaces

# interfaces(5) file used by ifup(8) and ifdown(8)

# Please note that this file is written to be used with dhcpcd
# For static IP, consult /etc/dhcpcd.conf and 'man dhcpcd.conf'

# Include files from /etc/network/interfaces.d:
source-directory /etc/network/interfaces.d

auto lo
iface lo inet loopback

#auto eth0
iface eth0 inet manual
#iface eth0 inet dhcp

#allow-hotplug wlan0
iface wlan0 inet manual
#    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
#
#allow-hotplug wlan1
#iface wlan1 inet manual
#    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf

auto br0
iface br0 inet dhcp
        post-up /etc/init.d/hostapd restart
        post-down /etc/init.d/hostapd stop
        bridge-ports eth0 wlan0
pelwell commented 8 years ago

For people with WiFi problems, Cypress (was Broadcom) have provided us with debug modules to help diagnose the problems. Because modules are kernel-version-specific you will first need to update (or possible revert) to a specific firmware release:

sudo rpi-update b0ef6e25679d3612a980708cf4c3907ce6e13e84
sudo shutdown -r now

Now you can download and install the debug modules:

wget -O brcmdbg.tgz "https://drive.google.com/uc?export=download&id=0B_P-i4u-SLBXb1o0UjVLY1NRbk0"
tar zxvf brcmdbg.tgz
sudo ./brcmdbg

The final command will run the installation script, which copies the original modules to one side before replacing them with the debug versions. Running the command again will revert to the original versions.

After installation, reboot your Pi 3 - now dmesg | grep brcmfmac will show diagnostic message like this:

[    9.952095] brcmfmac: F1 signature read @0x18000000=0x1541a9a6
[    9.978064] usbcore: registered new interface driver brcmfmac
[   10.277931] brcmfmac: brcmf_c_preinit_dcmds: Firmware version = wl0: May 27 2016 00:13:38 version 7.45.41.26 (r640327) FWID 01-df77e4a7
[   10.299380] brcmfmac: CONSOLE: hndarm_armr addr: 0x18003000, cr4_idx: 0
[   10.314284] brcmfmac: CONSOLE: 000000.001
[   10.326859] brcmfmac: CONSOLE: RTE (SDIO-CDC) 7.45.41.26 (r640327) on BCM43430 r1 @ 37.4/81.6/81.6MHz
[   10.326867] brcmfmac: CONSOLE: 000000.001 sdpcmdcdc0: Broadcom SDPCMD CDC driver
[   10.326876] brcmfmac: CONSOLE: 000000.005 reclaim section 0: Returned 47716 bytes to the heap
[   10.326882] brcmfmac: CONSOLE: 000000.007 wlc_bmac_info_init: host_enab 1
[   10.326890] brcmfmac: CONSOLE: 000000.026 wl0: Broadcom BCM43430 802.11 Wireless Controller 7.45.41.26 (r640327)
[   10.326895] brcmfmac: CONSOLE: 000000.027 TCAM: 256 used: 179 exceed:0
[   10.326902] brcmfmac: CONSOLE: 000000.028 reclaim section 1: Returned 81268 bytes to the heap
[   10.326911] brcmfmac: CONSOLE: 000000.029 sdpcmd_dpc: Enable
[   10.371343] brcmfmac: CONSOLE: 000000.121 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   10.422886] brcmfmac: brcmf_cfg80211_reg_notifier: not a ISO3166 code
[   10.432919] brcmfmac: CONSOLE: 000000.185 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   10.432929] brcmfmac: CONSOLE: 000000.186 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   10.500547] brcmfmac: CONSOLE: 000000.254 wl0: wl_open
[   10.531447] brcmfmac: brcmf_add_if: ERROR: netdev:wlan0 already exists
[   10.531457] brcmfmac: brcmf_add_if: ignore IF event
[   10.536516] brcmfmac: power management disabled
[   10.540645] brcmfmac: CONSOLE: 000000.284 wl0: wlc_enable_probe_req: state down, deferring setting of host flags
[   13.950422] brcmfmac: CONSOLE: 000003.703 wl_nd_ra_filter_clear_cache: Enter..

When you hit a problem, use dmesg > wifidbg.txt to capture the tracing to a file, along with any other kernel messages, then upload the file somewhere (gist, pastebin, dropbox etc.) and post a link to it along with a description of what you were doing when the error occurred.

BenoitSvB commented 8 years ago

please refresh my memory: what command to use to return to stable firnmware if I decide to stop debugging?

pelwell commented 8 years ago
sudo apt-get update
sudo apt-get upgrade

should do the trick. And sudo ./brcmdbg to just revert to the non-debug drivers.

BenoitSvB commented 8 years ago

https://gist.github.com/BenoitSvB/368983f2c09eed2d85a24e6920dc3a50#file-201609081547_wifidbg-txt

Started debugging; needed about 5 or 6 tries to associate; do not know why all but last attempt failed; will let it run until I see association loss and dump a new dmesg then. Inconsistent association behaviour was my problem before I stopped using onboard wifi so this might be on the spot. Please let me know if any additional activities could be helpfull.

BenoitSvB commented 8 years ago

https://gist.github.com/BenoitSvB/bf8acdbb7b664df90e885603bb4774ce#file-201609081628_wifidbg-txt Doing nothing but waiting; do we see here several association losses/recoveries?

pelwell commented 8 years ago

Thanks for that. Hmm - those logs aren't very informative, but let's see what Cypress come back with.

BenoitSvB commented 8 years ago

https://gist.github.com/BenoitSvB/98db53ff884e7b1a57bf1475d6106c56 Unexplained loss and recovery of association; long enough to see in systray icon. Accesspoint is Linksys wrt160n with Firmware: DD-WRT v24-sp2 (08/07/10) std. Guess I can stop debugging for now and revert to my €3 MT7601U dongle, but let me know if I can be of further help.

BenoitSvB commented 8 years ago

@pelwell I did not see any firmware restore after sudo apt-get update && sudo apt-get upgrade and sudo rpi-update gives *\ Your firmware is already up to date; Guess I need to run rpi-update with a specific git hash to revert to stable firmware. Do you know which hash?

pelwell commented 8 years ago

The commit history in the RPI-Distro repo shows that you want commit 390f53ed0fd79df274bdcc81d99e09fa262f03ab from the firmware repo, so run:

sudo rpi-update 390f53ed0fd79df274bdcc81d99e09fa262f03ab
BenoitSvB commented 8 years ago

@pelwell: root@pi3b:/home/pi# sudo rpi-update 390f53ed0fd79df274bdcc81d99e09fa262f03ab * Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom * Performing self-update * Relaunching after update * Raspberry Pi firmware updater by Hexxeh, enhanced by AndrewS and Dom Invalid git hash specified

pelwell commented 8 years ago

Ah, the Hexxeh rpi-firmware has different commit IDs - try 569e6611ac20c735647eb0e550484a73935a672d.

thomasf commented 8 years ago

I wonder if https://github.com/raspberrypi/linux/issues/1552 / #1444 might be related to this issue as well.

I have recently deployed a 40xRPI3 setup which does some bluetooth stuff, we had to get usb wifi interfaces or else wlan would constantly freeze.. We now use the internal bl device and the internal wifi module is blacklisted in modprobe.d.

It might maybe be useful to do hcitool name 11:11:11:11:11:11 and see if that generates any interesting log entries as well.. I have just been following this issue, havent had the time to set up my lab environment to test anything myself. We had some wifi freezes without BT enabled but the combination of wifi+bt can more or less always kill wifi in a very short timespan.. This was always reproducable over any number of our rpi's

BenoitSvB commented 8 years ago

@pelwell OK; uname -a gives Linux pi3b.thuis 4.4.13-v7+ #894 SMP Mon Jun 13 13:13:27 BST 2016 armv7l GNU/Linux Just for information: where would anyone find the git hash for the actual stable firmware version?

BenoitSvB commented 8 years ago

@thomasf although I have Bluetooth up, I have no use for it at the moment.hcitool name 11:11:11:11:11:11 does not return anything; which is, I suppose, to be expected as I am not connected to any device. Maybe I should buy me a BT audio device to play with.

pelwell commented 8 years ago

Define stable.

The hash I (finally) gave you will is for the 20th June firmware release, which you will get if you run:

sudo apt-get update raspberrypi-kernel
sudo apt-get update raspberrypi-bootloader

I'm not aware of a single place that contains the hash of the most recent "stable" release, but by going through RPI-Distro as I did then cross-referencing with the Hexxeh repo you can get rpi-update hashes for any release you like. If you consider the 2016-05-23 release to be stable because it was part of the last full Raspbian release then you want hash 3b98f7433649e13cf08f54f509d11491c99c4c0b which translates to an rpi-update hash of 2b9c0bfacfc11ee8bb9b30dc9cdb36289698f8a8 .

thomasf commented 8 years ago

@BenoitSvB Just running that hcitool command from a fresh boot without touching hci0 with any other software causes the wifi to start behaving badly in our tests, I don't know if matters if there are any other bluetooth devices but it is the smallest reproducable example I can think of for triggering the wifi freezing problems.

thomasf commented 8 years ago

I've also tested external bt dongle + internal wifi but the internal wifi sometimes hangs even when the internal bcm bt driver isn't loaded. The "solution" (as in quick fix) for us was to use usb wifi adapters, that has been proved stable in our tests and production usage.

BenoitSvB commented 8 years ago

I still suspect #1313 as related.

Op 8-9-2016 om 18:07 schreef Thomas Frössman:

I've also tested external bt dongle + internal wifi but the internal wifi sometimes hangs even when the internal bcm bt driver isn't loaded. The "solution" (as in quick fix) for us was to use usb wifi adapters, that has been proved stable in our tests and production usage.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/1342#issuecomment-245649229, or mute the thread https://github.com/notifications/unsubscribe-auth/AFyzObJxRjzQ-uMUlfe8hjRasrfq3nkwks5qoDLXgaJpZM4HupC5.

BenoitSvB commented 8 years ago

@pelwell stable would in this case be the firmware as released by the Foundation with its last publicized image and updated by "sudo apt-get update && sudo apt-get upgrade" only, so without invocation of rpi-update (with or without a speciific git hash, which is meant as I understood for upgrading to more recent firmware for specific purposes only). Which brings me to the question: can I read the hash of my operational firmware before loading a new firmware for testing, to make a restore after testing easier as I would not trust myself conducting the cross-reference you mentioned...

pelwell commented 8 years ago

Perhaps - cat /boot/.firmware_revision is written by rpi-update, but without trying it I couldn't tell you if the Raspbian releases also write it.

BenoitSvB commented 8 years ago

boot/.firmware-revision is a rpi-update thing ( https://www.raspberrypi.org/forums/viewtopic.php?t=106073&p=732449#p731830 )

But I found with:

zcat /usr/share/doc/raspberrypi-bootloader/changelog.Debian.gz

that I want indeed:

I understand the crossref from https://github.com/RPi-Distro/firmware/commits/debian?author=popcornmix to https://github.com/Hexxeh/rpi-firmware/commits/master is made on carefully comparing dates and descriptions from commits.

Learned something; thnx :)

Op 8 sep. 2016 8:28 p.m. schreef "Phil Elwell" notifications@github.com:

Perhaps - cat /boot/.firmware_revision is written by rpi-update, but without trying it I couldn't tell you if the Raspbian releases also write it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/raspberrypi/linux/issues/1342#issuecomment-245693018, or mute the thread https://github.com/notifications/unsubscribe-auth/AFyzOQ_pfODaEmuBGR6pQVXs2W6LggW8ks5qoFO2gaJpZM4HupC5 .

pelwell commented 8 years ago

@BenoitSvB: Your traces seem to show a different kind of issue - the firmware isn't giving any clues about why you are being disconnected. You might get some more clues from a packet sniffer such as WaveShark.

@mathieugouin @dh-connect @juched78 @maciex @duncanmcdowell: I have a Cypress engineer who is keen to find out more about your issues; if you send an email to me - phil at raspberrypi dot org - I can put you in touch with him. If you want to speed things along, install the debug modules as outlined above and save the output of dmesg when things go wrong.

BenoitSvB commented 8 years ago

@pelwell Google did not not return much substantial on 'packet sniffer Waveshark' but I guess you meant WireShark. The fact that blacklisting brcmutil & brcmfmac while using a MT7601U dongle makes the erratic connect/disconnect behaviour disappear, combined with the frequent 'out-of-order' occurances (see #1313, now hidden but not solved) makes me suspect a Broadcom/Cypress hardware cause. Wireshark might be of help, but I would need assistance to setup/conduct a serious debugging hardware effort.

pelwell commented 8 years ago

Yes, I meant wireshark.

You could use the dumpcap utility (part of the text-mode tshark package) to record all activity to a file, then kill it when the dmesg log includes a suspicious message. Something like this:

sudo apt-get install -y tshark
# You can say no when it asks if non-superusers can capture packets
dumpcap -D
# if your wlan isn't interface 2, change the next command to match
# Leave dumpcap recording in the background
sudo dumpcap -i 2 -q -w packets.pcap &
# Search for the error message, then kill the capture
dmesg -w | grep --max-count 1 "wlc_enable_probe_req: state down, deferring setting of host flags" && sudo killall dumpcap

Note that although "grep --max-count 1" is supposed to stop after one match, it seems to require one more line of input to actually make it stop, but that shouldn't be a problem in practise.

If your capture file gets too large you can get dumpcap to use a fixed duration recording using the "-b duration:60" option (for one minute). There is the possibility that restarting the capture like this could happen at a bad time and lose the interesting packets, but this is unlikely. You can always make it less likely by increasing the duration.

pelwell commented 8 years ago

@BenoitSvB There is a thread here that suggests disabling roaming in the Pi3 WiFi driver as a way of avoiding connectivity issues. Roaming allows a device to automatically move between APs with the same SSID, but that is likely to be less useful on a static device such as a Pi3, and there is a suggestion that it can eventually lead to a total loss of connectivity.

Could you try enabling the roamoff module parameter? You need to create create /etc/modprobe.d/brcmfmac.conf containing the following:

options brcmfmac roamoff=1
BenoitSvB commented 8 years ago

@pelwell: Disabling roaming is not the solution; but it make me play with different channels and a second accesspoint. I discovered that the onboard wifi adapter only has problems with some channels (e.g. 1, 5) and only on the Linksys WRT160N with DD-WRT firmware. Curiously though none of my other wifi clients shared this problems: they will connect without problems on all offered channels on both accesspoints. Good for me I have a stable workaround (not using channels onboard wifi has problems with) but no clarity in the matter. Do you want me to conduct specific testing? By the way do we need to set parameter options brcmfmac debug = 1 in the /etc/modprobe.d/brcmfmac.conf while using the special test-drivers? And do you know a way to measure the uptime of a wifi association: then I could more systematically test all channels for longer periods without making gigantic capture files.

pelwell commented 8 years ago

I was assured that the requested debugging is enabled in the debug drivers by default (it has the same effect as options bcrmfmac debug=0x100000), but feel free to experiment with different values.

I'm not aware of any way to measure uptime for an association, other than polling frequently and hoping to spot a change.

A Cypress employee is aware of this thread, but drop me an email (phil at raspberrypi dot org) if you are happy to be contacted directly.

NTag commented 8 years ago

Hello,

Is there any progress on this issue? I can connect to my open Wi-Fi network, and after a random time I have this in my logs:

Sep 26 22:42:36 dhcpcd: wlan0: carrier lost
Sep 26 22:42:36 kernel: brcmfmac: brcmf_cfg80211_reg_notifier: not a ISO3166 code
Sep 26 22:42:36 kernel: cfg80211: World regulatory domain updated:
Sep 26 22:42:36 kernel: cfg80211: DFS Master region: unset
Sep 26 22:42:36 kernel: cfg80211: (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
Sep 26 22:42:36 kernel: cfg80211: (2402000 KHz - 2472000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: (2457000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: (2474000 KHz - 2494000 KHz @ 20000 KHz), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
Sep 26 22:42:36 kernel: cfg80211: (5490000 KHz - 5730000 KHz @ 160000 KHz), (N/A, 2000 mBm), (0 s)
Sep 26 22:42:36 kernel: cfg80211: (5735000 KHz - 5835000 KHz @ 80000 KHz), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: (57240000 KHz - 63720000 KHz @ 2160000 KHz), (N/A, 0 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211: Regulatory domain changed to country: CH
Sep 26 22:42:36 kernel: cfg80211:  DFS Master region: ETSI
Sep 26 22:42:36 kernel: cfg80211:   (start_freq - end_freq @ bandwidth), (max_antenna_gain, max_eirp), (dfs_cac_time)
Sep 26 22:42:36 kernel: cfg80211:   (2402000 KHz - 2482000 KHz @ 40000 KHz), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211:   (5170000 KHz - 5250000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (N/A)
Sep 26 22:42:36 kernel: cfg80211:   (5250000 KHz - 5330000 KHz @ 80000 KHz, 160000 KHz AUTO), (N/A, 2000 mBm), (0 s)
Sep 26 22:42:36 kernel: cfg80211:   (5490000 KHz - 5710000 KHz @ 160000 KHz), (N/A, 2700 mBm), (0 s)
Sep 26 22:42:36 kernel: cfg80211:   (57000000 KHz - 66000000 KHz @ 2160000 KHz), (N/A, 4000 mBm), (N/A)
Sep 26 22:42:36 dhcpcd: wlan0: deleting address 2a02::xxxx/64
Sep 26 22:42:36 dhcpcd: wlan0: deleting default route via fe80::xxxx
Sep 26 22:42:36 dhcpcd: wlan0: deleting route to 2a02:xxxx::/64
Sep 26 22:42:36 dhcpcd: wlan0: deleting address fe80::xxxx
Sep 26 22:42:36 dhcpcd: wlan0: deleting route to 10.206.0.0/16
Sep 26 22:42:36 dhcpcd: wlan0: deleting default route via 10.206.0.1

And then I can't ping the router.

After a ifdown wlan0 && ifup wlan0 it works again, until the next wlan0: carrier lost.

Power management is disabled, bluetooth is disabled, roaming is disabled (as you suggested) and my version is Linux pi3 4.4.17-v7+.

sin5678 commented 8 years ago

it always happened when bridge eth0 with wlan0 ,i got the same issue as https://github.com/raspberrypi/linux/issues/1375

TheOriginalMrWolf commented 8 years ago

I have exactly the same issue of Pi3 onboard WiFi dropping out after a random period of time. ifup gets it running again no problem.

After much investigation, I found it was due to having three APs (BSSIDs) with one SSID (1 each on channel 1, 6, & 11). This setup supports seamless roaming and works perfectly with all other WLAN clients.

Enabling debugging/logging with standard driver seems to show that at some stage the Pi decides to deauthenticate and even blacklists one of the BSSIDs. Reason is unclear, but seems to be a decision made at the Pi end.

When I have exactly the same config on the Pi but with only one BSSID for the SSID, Pi can hang on for days without a hitch.

Unfortunately, disabling roaming as per pelwell's link (http://projectable.me/optimize-my-pi-wi-fi/) isn't really feasible, having only one BSSID per SSID isn't an option, and I'd rather not have to rely on a script that pings some host & then runs ifdown/ifup.

Is any further investigation being done towards supporting multiple BSSIDs per SSID, or can I do something specifically to support the investigation?

Thanks!

dmcinnes commented 7 years ago

I'm having this problem and my network is similar to @TheOriginalMrWolf's. I have an Apple base station and a airport express in a mesh configuration using WDS.

varl0g commented 7 years ago

I'm having this issue too. If I copy files to a samba share, the wifi connection is lost (raspberry 3, new installed raspbian). Syslog: brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012

tuomas2 commented 7 years ago

I'm getting exactly same issue when playing music with upnp (gmediarender).

cdown commented 7 years ago

I'm having the same issue when starting voice calls on wechat, with the rpi as an AP using hostapd. I get a bunch of spam like this:

[19841.278019] net_ratelimit: 940 callbacks suppressed
[19841.304748] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.331372] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.361587] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.399362] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.434506] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.466598] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.496736] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.525425] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!
[19841.552678] brcmfmac: brcmf_sdio_bus_txdata: out of bus->txq !!!

With traces like this:

[19837.728722] ------------[ cut here ]------------
[19837.730033] WARNING: CPU: 3 PID: 503 at drivers/net/wireless/brcm80211/brcmfmac/core.c:1191 brcmf_netdev_wait_pend8021x+0xdc/0xe8 [brcmfmac]()
[19837.732645] Modules linked in: xt_REDIRECT nf_nat_redirect xt_tcpudp nf_nat_pptp nf_nat_proto_gre nf_conntrack_pptp nf_conntrack_proto_gre iptable_filter ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack cdc_ether sr_mod cdrom brcmfmac brcmutil cfg80211 bcm2835_rng rng_core bcm2835_gpiomem bcm2835_wdt uio_pdrv_genirq uio sch_fq_codel snd_bcm2835 snd_pcm snd_timer snd ip_tables x_tables ipv6
[19837.743040] CPU: 3 PID: 503 Comm: hostapd Not tainted 4.4.38-1-ARCH #1
[19837.745188] Hardware name: BCM2709
[19837.747428] [<80015e54>] (unwind_backtrace) from [<80012ccc>] (show_stack+0x10/0x14)
[19837.752350] [<80012ccc>] (show_stack) from [<804f7dcc>] (dump_stack+0x94/0xb4)
[19837.755134] [<804f7dcc>] (dump_stack) from [<8002e958>] (warn_slowpath_common+0x84/0xb4)
[19837.760698] [<8002e958>] (warn_slowpath_common) from [<8002ea24>] (warn_slowpath_null+0x1c/0x24)
[19837.767009] [<8002ea24>] (warn_slowpath_null) from [<7f2a50b4>] (brcmf_netdev_wait_pend8021x+0xdc/0xe8 [brcmfmac])
[19837.774038] [<7f2a50b4>] (brcmf_netdev_wait_pend8021x [brcmfmac]) from [<7f2950b4>] (send_key_to_dongle+0x94/0xe8 [brcmfmac])
[19837.781637] [<7f2950b4>] (send_key_to_dongle [brcmfmac]) from [<7f2972a8>] (brcmf_cfg80211_add_key+0x16c/0x324 [brcmfmac])
[19837.789919] [<7f2972a8>] (brcmf_cfg80211_add_key [brcmfmac]) from [<7f125ae8>] (nl80211_new_key+0x11c/0x28c [cfg80211])
[19837.798477] [<7f125ae8>] (nl80211_new_key [cfg80211]) from [<807441ec>] (genl_rcv_msg+0x254/0x3c8)
[19837.807003] [<807441ec>] (genl_rcv_msg) from [<80743564>] (netlink_rcv_skb+0xb4/0xd8)
[19837.815674] [<80743564>] (netlink_rcv_skb) from [<80743f88>] (genl_rcv+0x24/0x34)
[19837.824371] [<80743f88>] (genl_rcv) from [<80742efc>] (netlink_unicast+0x188/0x218)
[19837.833161] [<80742efc>] (netlink_unicast) from [<807432cc>] (netlink_sendmsg+0x278/0x330)
[19837.842135] [<807432cc>] (netlink_sendmsg) from [<806fa454>] (sock_sendmsg+0x14/0x24)
[19837.851174] [<806fa454>] (sock_sendmsg) from [<806faadc>] (___sys_sendmsg+0x1d0/0x1d8)
[19837.860301] [<806faadc>] (___sys_sendmsg) from [<806fb780>] (__sys_sendmsg+0x3c/0x68)
[19837.869517] [<806fb780>] (__sys_sendmsg) from [<8000f240>] (ret_fast_syscall+0x0/0x34)
[19837.878793] ---[ end trace e4988f6034c7c2ec ]---

The trace looks suspiciously similar to @jrmhaig's.

cdown commented 7 years ago

I just had this happen again, and did some debugging. I got some different messages this time, which seem interesting (seems they are the same messages that @maciex got once):

[25353.256286] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[25355.254920] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110
[25355.257952] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -52
  1. It looks like the whole system freezes when this happens. Running while sleep 1; do date; done in a loop results in a gap when the freeze occurs. I wonder if this means that brcmf_proto_bcdc_msg returning -110 (timeout) is just a symptom of the real issue -- it just logs wherever we freeze.
  2. I measured (with vcgencmd) the temperature and voltages at the time of the freeze. Nothing to report there, as far as I can tell.
  3. My system is an AP with forwarding to a ZTE 4G modem via USB (ie. client -> wlan0 -> rpi -> usb0 -> 4g. It seems that usb0 is still able to access the internet when the wifi freeze happens.

Re: the comments above, this happens in NAT sharing mode for me with roamoff=1. Neither of those fixed or mitigated the issue for me.

cdown commented 7 years ago

After disabling WPA (using create_ap -w 2 in my case to only enable WPA2), the problem seems fixed. Unclear why though.

rcassaniga commented 7 years ago

I am also facing the issues reported here. In my case it happens whenever I access files (usually mp3) through Samba from Samsung + ES file manager and player.

My raspberry pi3 is wifi connected to my AP. Therefore all the communication with it is thought wifi network. It does not have any monitor nor keyboard nor mouse.

I can easily reproduce the error, so if anyone want me to produce log files, let me know how I could help.

Below my syslog entries.

Dec 27 16:11:50 raspberrypi kernel: [ 560.902063] brcmfmac: brcmf_sdio_hostmail: Unknown mailbox data content: 0x40012 Dec 27 16:11:52 raspberrypi kernel: [ 562.928930] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:11:54 raspberrypi kernel: [ 564.926659] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:11:54 raspberrypi kernel: [ 564.926820] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -52 Dec 27 16:11:56 raspberrypi kernel: [ 566.924560] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:11:58 raspberrypi kernel: [ 568.922555] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:11:58 raspberrypi kernel: [ 568.928073] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -52 Dec 27 16:12:00 raspberrypi kernel: [ 570.920675] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:12:02 raspberrypi kernel: [ 572.918980] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:12:02 raspberrypi kernel: [ 572.924580] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -52 Dec 27 16:12:04 raspberrypi kernel: [ 574.917259] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:12:06 raspberrypi kernel: [ 576.915703] brcmfmac: brcmf_proto_bcdc_query_dcmd: brcmf_proto_bcdc_msg failed w/status -110 Dec 27 16:12:06 raspberrypi kernel: [ 576.921498] brcmfmac: brcmf_cfg80211_get_station: GET STA INFO failed, -52 Dec 27 16:12:06 raspberrypi ifplugd(wlan0)[1149]: Using detection mode: IFF_RUNNING

varl0g commented 7 years ago

@rcassaniga I also had the same problem with the identical setup.

Solution after hours of debugging: Turn off IPv6 on the raspberry in /etc/modprobe.d/ipv6.conf: alias net-pf-10 off alias ipv6 off options ipv6 disable_ipv6=1

This is only a workaround if you don't use ipv6 in your network.

tuomas2 commented 7 years ago

Thank you @varl0g you are my hero! :) Looks like this workaround is working for me, can't reproduce the problem any more.

rcassaniga commented 7 years ago

@varl0g: It seams the workaround worked because I cannot reproduce the error.

Thanks and happy 2017.

rajid commented 7 years ago

I tried turning off ipv6. That didn't make a difference. I tried turning off power save mode. Still no difference. However, when I set my AP's channel to 6 (instead of 11), my Raspberry Pi has been up for 2 days with no problems!