kaloz / mwlwifi

mac80211 driver for the Marvell 88W8864 802.11ac chip
395 stars 119 forks source link

ieee80211 phy0: cmd 0x9125=BAStream timed out #142

Closed acarlo79 closed 7 years ago

acarlo79 commented 7 years ago

Router: WRT1900ACS v1 OS: Lede 17.01.0 Stable Kernel 4.4.50 Driver: kmod-mwlwifi 4.4.50+10.3.2.0-20170110-1

Messages: Thu Mar 9 19:10:03 2017 kern.err kernel: [79894.633076] ieee80211 phy0: cmd 0x9122=UpdateEncryption timed out Thu Mar 9 19:10:03 2017 kern.err kernel: [79894.639201] ieee80211 phy0: return code: 0x1122 Thu Mar 9 19:10:03 2017 kern.err kernel: [79894.643767] ieee80211 phy0: timeout: 0x1122 Thu Mar 9 19:10:03 2017 kern.err kernel: [79894.647966] ieee80211 phy0: failed execution Thu Mar 9 19:10:03 2017 kern.err kernel: [79894.652254] wlan0: failed to set key (2, ff:ff:ff:ff:ff:ff) to hardware (-5)

Kernel error: [68762.999120] ieee80211 phy0: cmd 0x9125=BAStream timed out [68763.004548] ieee80211 phy0: return code: 0x1125 [68763.009098] ieee80211 phy0: timeout: 0x1125 [68763.013333] ieee80211 phy0: destroy ba failed execution [69025.779777] ieee80211 phy0: cmd 0x9122=UpdateEncryption timed out [69025.785915] ieee80211 phy0: return code: 0x1122 [69025.790465] ieee80211 phy0: timeout: 0x1122 [69025.794679] ieee80211 phy0: failed execution [69025.798969] wlan0: failed to remove key (0, ac:22:0b:a5:b5:ea) from hardware (-5)

The wifi sometime the 5Ghz some time the 2.4Ghz becomes unaccessible and the above mentioned logs are reported in the router.

yuhhaurlin commented 7 years ago

This kind of problem had been reported again and again. But it can't be reproduced by me when I try to check this problem with reporter. Can you let me know how to build the code and how to reproduce it?

yuhhaurlin commented 7 years ago

The same driver had been used by other projects and it works well without this kind of problem. I hope you can help me to reproduce this problem. Thanks.

yuhhaurlin commented 7 years ago

Does anyone know the certain way to let this problem happen? As I remember it only happened with one specific build when this problem had been reported last time.

Chadster766 commented 7 years ago

With McDebian I can only see this issue with the WRT3200ACM which is a known issue but not with any other WRT models.

acarlo79 commented 7 years ago

To build the image I use:

Ubuntu 16.04 running kernel: 4.4.0-62-generic

this is my diffconfig used to build Lede 17.01.0 stable

# CONFIG_TARGET_mvebu=y CONFIG_TARGET_mvebu_Default=y CONFIG_TARGET_BOARD="mvebu" CONFIG_LIBSODIUM_MINIMAL=y CONFIG_OPENSSL_ENGINE_CRYPTO=y CONFIG_OPENSSL_ENGINE_DIGEST=y CONFIG_OPENSSL_HARDWARE_SUPPORT=y CONFIG_OPENSSL_WITH_DEPRECATED=y CONFIG_OPENSSL_WITH_EC=y CONFIG_OPENSSL_WITH_NPN=y CONFIG_OPENSSL_WITH_PSK=y CONFIG_OPENSSL_WITH_SRP=y CONFIG_OPENVPN_openssl_ENABLE_DEF_AUTH=y CONFIG_OPENVPN_openssl_ENABLE_FRAGMENT=y CONFIG_OPENVPN_openssl_ENABLE_HTTP=y CONFIG_OPENVPN_openssl_ENABLE_LZO=y CONFIG_OPENVPN_openssl_ENABLE_MULTIHOME=y CONFIG_OPENVPN_openssl_ENABLE_PF=y CONFIG_OPENVPN_openssl_ENABLE_PORT_SHARE=y CONFIG_OPENVPN_openssl_ENABLE_SERVER=y CONFIG_OPENVPN_openssl_ENABLE_SMALL=y CONFIG_OPENVPN_openssl_ENABLE_SOCKS=y CONFIG_PACKAGE_ddns-scripts=y CONFIG_PACKAGE_ddns-scripts_no-ip_com=y CONFIG_PACKAGE_dmesg=y CONFIG_PACKAGE_dnscrypt-proxy=y CONFIG_PACKAGE_igmpproxy=y CONFIG_PACKAGE_kmod-bridge=y CONFIG_PACKAGE_kmod-crypto-aead=y CONFIG_PACKAGE_kmod-crypto-authenc=y CONFIG_PACKAGE_kmod-crypto-cbc=y CONFIG_PACKAGE_kmod-crypto-deflate=y CONFIG_PACKAGE_kmod-crypto-des=y CONFIG_PACKAGE_kmod-crypto-echainiv=y CONFIG_PACKAGE_kmod-crypto-hash=y CONFIG_PACKAGE_kmod-crypto-hmac=y CONFIG_PACKAGE_kmod-crypto-iv=y CONFIG_PACKAGE_kmod-crypto-manager=y CONFIG_PACKAGE_kmod-crypto-md5=y CONFIG_PACKAGE_kmod-crypto-null=y CONFIG_PACKAGE_kmod-crypto-pcompress=y CONFIG_PACKAGE_kmod-crypto-rng=y CONFIG_PACKAGE_kmod-crypto-sha1=y CONFIG_PACKAGE_kmod-crypto-sha256=y CONFIG_PACKAGE_kmod-crypto-wq=y CONFIG_PACKAGE_kmod-cryptodev=y CONFIG_PACKAGE_kmod-fs-exfat=y CONFIG_PACKAGE_kmod-fs-ext4=y CONFIG_PACKAGE_kmod-fs-hfsplus=y CONFIG_PACKAGE_kmod-lib-crc16=y CONFIG_PACKAGE_kmod-lib-zlib=y CONFIG_PACKAGE_kmod-llc=y CONFIG_PACKAGE_kmod-mii=y CONFIG_PACKAGE_kmod-nls-base=y CONFIG_PACKAGE_kmod-nls-utf8=y CONFIG_PACKAGE_kmod-scsi-core=y CONFIG_PACKAGE_kmod-stp=y CONFIG_PACKAGE_kmod-tun=y CONFIG_PACKAGE_kmod-usb-core=y CONFIG_PACKAGE_kmod-usb-net=y CONFIG_PACKAGE_kmod-usb-net-rtl8152=y CONFIG_PACKAGE_kmod-usb-storage=y CONFIG_PACKAGE_kmod-usb2=y CONFIG_PACKAGE_kmod-usb3=y CONFIG_PACKAGE_libcap=y CONFIG_PACKAGE_libiwinfo-lua=y CONFIG_PACKAGE_liblua=y CONFIG_PACKAGE_liblzma=y CONFIG_PACKAGE_liblzo=y CONFIG_PACKAGE_libopenssl=y CONFIG_PACKAGE_libpcre=y CONFIG_PACKAGE_libpthread=y CONFIG_PACKAGE_librt=y CONFIG_PACKAGE_libsodium=y CONFIG_PACKAGE_libubus-lua=y CONFIG_PACKAGE_libuci-lua=y CONFIG_PACKAGE_lua=y CONFIG_PACKAGE_luci=y CONFIG_PACKAGE_luci-app-ddns=y CONFIG_PACKAGE_luci-app-firewall=y CONFIG_PACKAGE_luci-app-openvpn=y CONFIG_PACKAGE_luci-base=y CONFIG_PACKAGE_luci-lib-ip=y CONFIG_PACKAGE_luci-lib-jsonc=y CONFIG_PACKAGE_luci-lib-nixio=y CONFIG_PACKAGE_luci-mod-admin-full=y CONFIG_PACKAGE_luci-proto-ipv6=y CONFIG_PACKAGE_luci-proto-ppp=y CONFIG_PACKAGE_luci-theme-bootstrap=y CONFIG_PACKAGE_openssh-keygen=y CONFIG_PACKAGE_openssl-util=y CONFIG_PACKAGE_openvpn-easy-rsa=y CONFIG_PACKAGE_openvpn-openssl=y CONFIG_PACKAGE_rpcd=y CONFIG_PACKAGE_shadow-common=y CONFIG_PACKAGE_shadow-su=y CONFIG_PACKAGE_shadow-utils=y CONFIG_PACKAGE_uhttpd=y CONFIG_PACKAGE_uhttpd-mod-ubus=y CONFIG_PACKAGE_wget=y CONFIG_PACKAGE_xz=y CONFIG_PACKAGE_xz-utils=y CONFIG_PACKAGE_zlib=y # Unfortunately i couldn't narrow down to a specific action to trigger the issue neither a specific device.

What I noticed also in the past, it seems that the driver is affected from some other packages not necessary the new Kernel.

Hope you can help with this.

yuhhaurlin commented 7 years ago

Can you give me which commit you used to build your LEDE image? And where did you get the source code.

acarlo79 commented 7 years ago

git clone https://git.lede-project.org/source.git cd source git checkout v17.01.0 scripts/feeds update -a scripts/feeds install -a cp diffconfig_file .config make defconfig make menuconfig (you have just to save the config) make

That's what I always use.

yuhhaurlin commented 7 years ago

How to make the problem happened?

acarlo79 commented 7 years ago

well, I didn't figure it out yet what's the trigger. I have the feeling that it just happens on clients connecting/disconnecting but I don't have any proof .

For example it just happened during the night when there is no specific traffic on the network except iphones/android phones and tablets that could randomly connect and disconnect.

acarlo79 commented 7 years ago

if interested, i can give you my built image file

yuhhaurlin commented 7 years ago

Did you try other version of code or did you run any pre-built image and you will also encounter this problem? I will try to check it later. Thanks.

acarlo79 commented 7 years ago

no I don't run any prebuilt image.

I had similar issue reported to you here: https://github.com/kaloz/mwlwifi/issues/120

But that was very specific on how to reproduce it and you got a fix, I then upgraded from that version to the new stable. Not sure this can help.

yuhhaurlin commented 7 years ago

If you use driver 10.3.2.0-20161123-1 on this build, did you also encounter the problem?

acarlo79 commented 7 years ago

Didn't try this yet, it was my planned test for the weekend.

yuhhaurlin commented 7 years ago

Please do that first. Thanks.

acarlo79 commented 7 years ago

sure will keep you posted in regards

acarlo79 commented 7 years ago

OK, just did a quick test and, basically the same reported in the issue #120 (large file transfer from USB hdd attached to the router via wifi on my mac).

And got the trigger for this:

Fri Mar 10 08:29:04 2017 kern.err kernel: [ 2731.104516] ieee80211 phy0: cmd 0x9125=BAStream timed out Fri Mar 10 08:29:04 2017 kern.err kernel: [ 2731.109941] ieee80211 phy0: return code: 0x1125 Fri Mar 10 08:29:04 2017 kern.err kernel: [ 2731.114492] ieee80211 phy0: timeout: 0x1125 Fri Mar 10 08:29:04 2017 kern.err kernel: [ 2731.118706] ieee80211 phy0: destroy ba failed execution

Going to rebuild the image with the old driver today and let you know.

acarlo79 commented 7 years ago

Just tested the Lede stable using the driver and it fails.

Fri Mar 10 20:30:56 2017 kern.err kernel: [26177.222883] ieee80211 phy0: cmd 0x9125=BAStream timed out Fri Mar 10 20:30:56 2017 kern.err kernel: [26177.228324] ieee80211 phy0: return code: 0x1125 Fri Mar 10 20:30:56 2017 kern.err kernel: [26177.232875] ieee80211 phy0: timeout: 0x1125 Fri Mar 10 20:30:56 2017 kern.err kernel: [26177.237087] ieee80211 phy0: destroy ba failed execution Fri Mar 10 20:31:00 2017 kern.err kernel: [26181.272349] ieee80211 phy0: cmd 0x9125=BAStream timed out Fri Mar 10 20:31:00 2017 kern.err kernel: [26181.277787] ieee80211 phy0: return code: 0x1125 Fri Mar 10 20:31:00 2017 kern.err kernel: [26181.282338] ieee80211 phy0: timeout: 0x1125 Fri Mar 10 20:31:00 2017 kern.err kernel: [26181.286558] ieee80211 phy0: destroy ba failed execution

Steps to reproduce the issue:

1) USB 3.0 attached to the router and exported via samba 2) mount the drive 3) transfer a large file (10GB) HDD -> PC

After few hundred megabytes the network will crash, tested on the 5GHz network.

Chadster766 commented 7 years ago

I will test that on a WRT1900ACS v1 with the latest McDebian for comparison.

yuhhaurlin commented 7 years ago

That is what I think. I will try to build the image as you described and check it. But it should be done later. You can use previous LEDE version first. Or any updated version which does not have this problem. Thanks.

kevle commented 7 years ago

Which version and combination of LEDE / OpenWRT, kernel and driver is currently supposed to work? I'm still having a hard time restarting the APs by powercycling as they become completely unresponsive after some time with this issue. ap_issues

danny30au commented 7 years ago

@yuhhaurlin Im using the latest mwlwifi commit from the 27/4/2017 im getting these errors in the kernel logs in OpenWRT/LEDE not sure if it's a problem.

[52034.954348] ------------[ cut here ]------------ [52034.959050] WARNING: CPU: 0 PID: 1855 at compat-wireless-2017-01-31/net/mac80211/agg-tx.c:347 _ieee80211_stop_tx_ba_session+0x1c0/0x1dc [mac80211] [52034.972428] Modules linked in: blowfish_generic blowfish_common pppoe ppp_async pppox ppp_generic nf_conntrack_ipv6 iptable_nat ipt_REJECT ipt_MASQUERADE xt_time xt_tcpudp xt_state xt_nat xt_multiport xt_mark xt_mac xt_limit xt_conntrack xt_comment xt_TCPMSS xt_REDIRECT xt_LOG ums_usbat ums_sddr55 ums_sddr09 ums_karma ums_jumpshot ums_isd200 ums_freecom ums_datafab ums_cypress ums_alauda slhc nf_reject_ipv4 nf_nat_redirect nf_nat_masquerade_ipv4 nf_conntrack_ipv4 nf_nat_ipv4 nf_nat nf_log_ipv4 nf_defrag_ipv6 nf_defrag_ipv4 nf_conntrack_rtcache nf_conntrack iptable_mangle iptable_filter ip_tables crc_ccitt mwlwifi mac80211 cfg80211 compat cryptodev xt_set ip_set_list_set ip_set_hash_netiface ip_set_hash_netport ip_set_hash_netnet ip_set_hash_net ip_set_hash_netportnet ip_set_hash_mac ip_set_hash_ipportnet [52035.044457] ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnetlink ip6t_REJECT nf_reject_ipv6 nf_log_ipv6 nf_log_common ip6table_mangle ip6table_filter ip6_tables x_tables tun sha512_generic sha256_generic seqiv jitterentropy_rng drbg hmac ghash_generic gf128mul gcm ecb ctr cmac ccm cbc authenc ohci_pci uhci_hcd ohci_platform ohci_hcd sd_mod gpio_button_hotplug [52035.083785] CPU: 0 PID: 1855 Comm: hostapd Not tainted 4.9.25 #0 [52035.089815] Hardware name: Marvell Armada 380/385 (Device Tree) [52035.095771] [] (unwind_backtrace) from [] (show_stack+0x10/0x14) [52035.103552] [] (show_stack) from [] (dump_stack+0x7c/0x9c) [52035.110806] [] (dump_stack) from [] (warn+0xbc/0xec) [52035.117710] [] (warn) from [] (warn_slowpath_null+0x1c/0x24) [52035.125333] [] (warn_slowpath_null) from [] (___ieee80211_stop_tx_basession+0x1c0/0x1dc [mac80211]) [52035.136304] [] (ieee80211_stop_tx_ba_session [mac80211]) from [] (ieee80211_stop_tx_ba_session+0x2c/0x40 [mac80211]) [52035.149092] [] (ieee80211_stop_tx_ba_session [mac80211]) from [] (ieee80211_sta_tear_down_BA_sessions+0x38/0x6c [mac80211]) [52035.162225] [] (ieee80211_sta_tear_down_BA_sessions [mac80211]) from [] (ieee80211_sta_eosp+0x1e4/0x51c [mac80211]) [52035.174485] [] (ieee80211_sta_eosp [mac80211]) from [] (sta_info_destroy+0xc/0x28 [mac80211]) [52035.185003] [] (sta_info_destroy [mac80211]) from [] (sta_info_destroy_addr_bss+0x2c/0x44 [mac80211]) [52035.196213] [] (sta_info_destroy_addr_bss [mac80211]) from [] (nl80211_del_station+0xe8/0xf0 [cfg80211]) [52035.207496] [] (nl80211_del_station [cfg80211]) from [] (genl_rcv_msg+0x288/0x310) [52035.216843] [] (genl_rcv_msg) from [] (netlink_rcv_skb+0x58/0xb4) [52035.224706] [] (netlink_rcv_skb) from [] (genl_rcv+0x20/0x34) [52035.232220] [] (genl_rcv) from [] (netlink_unicast+0x138/0x1fc) [52035.239909] [] (netlink_unicast) from [] (netlink_sendmsg+0x2f0/0x310) [52035.248209] [] (netlink_sendmsg) from [] (sock_sendmsg+0x14/0x24) [52035.256071] [] (sock_sendmsg) from [] (_sys_sendmsg+0x184/0x228) [52035.264021] [] (___sys_sendmsg) from [] (sys_sendmsg+0x40/0x64) [52035.271886] [] (__sys_sendmsg) from [] (ret_fast_syscall+0x0/0x3c) [52035.279845] ---[ end trace 6e34b86c7e6170ba ]---

yuhhaurlin commented 7 years ago

This is warning message, I will check and remove it. BTW, this is for 88W8864, right?

danny30au commented 7 years ago

@yuhhaurlin (88W8864) Yep indeed mate. Thanks mate. Keep up the good work. :)

yuhhaurlin commented 7 years ago

I will check and remove this warning message. Thanks.

woody77 commented 7 years ago

I'm getting the same error when I have two Mac laptops using the network heavily (simultaneously). If we're gaming (Minecraft Realms) on the 5GHz radio, it crashes multiple times a night. 2.4GHz seems fine.

LEDE 17.01.0-rc2, r3131-42f3c1f kmod-mac80211 - 4.4.47+2016-10-08-1 kmod-mwlwifi - 4.4.47+10.3.2.0-20170110-1

yuhhaurlin commented 7 years ago

Which device? I think it should be 88W8864.

yuhhaurlin commented 7 years ago

Warning message or the message like the title of this issue? If it is warning, the WiFi still can work, I will remove warning message later.

woody77 commented 7 years ago

Sorry, I should have added those.

The message like the title, and on a WRT1900AC (v1, mamba).

I've upgraded to LEDE 17.01.01, to see if that changes anything (I'm not expecting it to, as it appears to be the same firmware at 10.3.2.0-20170110-1. But it's now kernel v4.4.61.

yuhhaurlin commented 7 years ago

This will be checked later. It only happens for some builds.

fuqiang03 commented 7 years ago

Wireless 5G high speed data transmission (20 / M) above; a great chance of wireless 5g crash。 Using the exclusion method 2.4G wireless normal

ieee80211 phy0: cmd 0x9125=BAStream timed out ieee80211 phy0: return code: 0x1125 ieee80211 phy0: timeout: 0x1125

WRT1900acsV2 型号 Linksys WRT1900ACS CPU型号 ARMv7 Processor rev 1 (v7l) 温度 2.4G:48.1°C / 5G:49.5°C / CPU:78.5°C 固件版本 LEDE Reboot 17.01-SNAPSHOT r3438-2e206c79cc / LuCI Master (git-17.165.70928-dd6cb31) 内核版本 4.4.71 kmod-mwlwifi 4.4.71+10.3.4.0.git-2..6-1

yuhhaurlin commented 7 years ago

I will try to build image for you to do tests later.

fuqiang03 commented 7 years ago

I suspect other application code will affect wireless drives. It's under inspection now. However, wireless 5g is unstable at high speed and continues to transmit data In this version, it's obvious

yuhhaurlin commented 7 years ago

Your description is not very clear. Can you let me know your setting? This version of driver does not modify the code for previous devices except for re-architecture. As I know, the driver has this issue for specific build before. I want to create image for you to reproduce the problem and then I can reproduce the same problem here to check it. However, this will be done later.

fuqiang03 commented 7 years ago

Sorry, I found the problem, the previous information is not saved, and now only image When 5g transmits data at high speed, it crashes Previous compilation has occasionally occurred, and very few times this error has occurred I overlooked this mistake Compiled image now This error often occurs Other factors are being ruled out I don't think it has anything to do with the kernel

yuhhaurlin commented 7 years ago

In fact, we have done tests on 88W8864 and 88W8964 on our RD AP board. We did not find crash problem on 5 GHz. This issue has been raised many times under certain builds.

I think the better way is that I can create WRT1900ACS image and let you do test on it. However, I think I will do this kind of thing later due to some other functions and jobs are under working.

fuqiang03 commented 7 years ago

OK. Thank you for your help I'll look for my mistakes first Then feed back the information

wliao229 commented 7 years ago

Any update on this issue? (in relation to the closed #178)

fuqiang03 commented 7 years ago

Everything is normal after you replace the kernel

wliao229 commented 7 years ago

thank you, @fuqiang03 . I'm quite new to this. Could you give a bit more details about replacing the kernel? Im currently using kernel 4.9.30. Do you mean if I update the kernel to, say, 4.9.34, the problem will be solved?

Big Thank You!

fuqiang03 commented 7 years ago

This error only occurs in exceptional circumstances For example, large data transmission (5g frequency band) My guess is as follows I use the rule of elimination Hardware error Conflict between kernel and wireless After the kernel is updated, the fault disappears Current kernel 4.9.34

wliao229 commented 7 years ago

Thanks. I will upgrade kernel and see if the problem reoccurs. The issue I encountered was that the router dropped the connection to another wifi network after a few days since a reboot.

On Jul 3, 2017, at 10:40 AM, fuqiang03 notifications@github.com wrote:

This error only occurs in exceptional circumstances For example, large data transmission (5g frequency band) My guess is as follows I use the rule of elimination Hardware error Conflict between kernel and wireless After the kernel is updated, the fault disappears Current kernel 4.9.34

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

fuqiang03 commented 7 years ago

My question is the same as yours. Stable version kernel is recommended Have a try

fuqiang03 commented 7 years ago

Do not use the 4.9 kernel It's just my solution

haribert commented 7 years ago

I do not know if my problem is related to this one. I have a Squeezbox Radio which lose the connection to the WRT3200acm with LEDE 17.01.02 regularly. Normally a song plays for around 3 minutes, then the connection is gone and 10 minutes later the Squeezebox Radio is able to connect again. But the connection is not stable for more than 3 minutes.

With LEDE 17.01.01 this problem was not present.

If I any logs are needed, I am happy to provide them.

wliao229 commented 7 years ago

@fuqiang03 I see. Yes. I had no problem when running 4.4 Kernel. I will see if it is necessary to downgrade my 1900ACS to that kernel.

yuhhaurlin commented 7 years ago

Can you use the image for WRT1900ACS I created on issue #184 to see if you still encounter this problem? If yes, let me know how to reproduce it.

yuhhaurlin commented 7 years ago

I close this one. If you still encounter this problem, you can create new issue. Please specify which build and how to reproduce the problem before creating the issue.