lwfinger / rtl8723bu

Driver for RTL8723BU
283 stars 143 forks source link

Wifi unusable on Beaglebone Black Enhanced, kernel exceptions #171

Closed nahuel closed 3 years ago

nahuel commented 3 years ago

I'm using this driver in a Beaglebone Black Enhanced, who comes with the RTL8723BU (0bda:b720 usb id) chip integrated onboard. When there is some traffic in the wifi link, this kernel exception happens and the connection is lost for some seconds. It happens all the time, making the wifi connection almost unusable. The module was compiled with CONFIG_CONCURRENT_MODE disabled and CONFIG_POWER_SAVING=n . Kernel version: 5.4.70-ti-r19 . This happened also on older kernel versions.

Module initialization:

[   20.006703] RTL871X: module init start
[   20.006726] RTL871X: rtl8723bu v4.3.6.11_12942.20141204_BTCOEX20140507-4E40
[   20.006733] RTL871X: rtl8723bu BT-Coex version = BTCOEX20140507-4E40
[   20.156473] RTL871X: rtw_ndev_init(wlan0)
[   20.168162] usbcore: registered new interface driver rtl8723bu
[   20.168185] RTL871X: module init ret=0
[   22.855407] RTL871X: RTW_ADAPTIVITY_EN_
[   22.855426] AUTO, chplan:0x20, Regulation:0,0
[   22.855440] RTL871X: RTW_ADAPTIVITY_MODE_
[   22.855443] NORMAL
[   23.513221] RTL871X: set ssid [ENFORCER-022-AP] fw_state=0x00000008
[   23.951045] ti-sysc 4a101200.target-module: OCP softreset timed out

Errors:

[63866.794568] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[64165.461722] RTL871X: set ssid [ENFORCER-022-AP] fw_state=0x00000008
[64167.330713] RTL871X: start auth
[64167.334005] RTL871X: auth success, start assoc
[64167.336961] RTL871X: rtw_cfg80211_indicate_connect(wlan0) BSS not found !!
[64167.337065] RTL871X: assoc success
[64167.337358] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
[64167.341971] ------------[ cut here ]------------
[64167.342506] WARNING: CPU: 0 PID: 17833 at net/wireless/sme.c:756 __cfg80211_connect_result+0x468/0x500 [cfg80211]
[64167.342530] Modules linked in: xt_conntrack xt_tcpudp iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 iptable_filter 8723bu(O) icss_iep prueth_ecap cfg80211 ecdh_generic ecc evdev st_pressure_spi st_pressure st_sensors_spi st_sensors libcomposite ip_tables x_tables
[64167.342722] CPU: 0 PID: 17833 Comm: kworker/u2:2 Tainted: G        W  O      5.4.70-ti-r19 #1stretch
[64167.342742] Hardware name: Generic AM33XX (Flattened Device Tree)
[64167.343007] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[64167.343032] Backtrace:
[64167.343095] [<c010e3fc>] (dump_backtrace) from [<c010e718>] (show_stack+0x20/0x24)
[64167.343131]  r7:60070113 r6:c14e25d0 r5:00000000 r4:c14e25d0
[64167.343182] [<c010e6f8>] (show_stack) from [<c0e2ab70>] (dump_stack+0xb8/0xcc)
[64167.343229] [<c0e2aab8>] (dump_stack) from [<c013d088>] (__warn+0xec/0x104)
[64167.343259]  r7:00000009 r6:bf0c962c r5:00000000 r4:00000000
[64167.343294] [<c013cf9c>] (__warn) from [<c013d158>] (warn_slowpath_fmt+0xb8/0xc0)
[64167.343329]  r9:00000009 r8:bf09cd70 r7:000002f4 r6:bf0c962c r5:00000000 r4:c1405fc8
[64167.343578] [<c013d0a4>] (warn_slowpath_fmt) from [<bf09cd70>] (__cfg80211_connect_result+0x468/0x500 [cfg80211])
[64167.343615]  r9:20070113 r8:eea4de54 r7:00000000 r6:c1405fc8 r5:ecef9b4c r4:ecb6c000
[64167.344040] [<bf09c908>] (__cfg80211_connect_result [cfg80211]) from [<bf0648e8>] (cfg80211_process_wdev_events+0x118/0x174 [cfg80211])
[64167.344075]  r8:ecb6c08c r7:ecb6c024 r6:ecb6c094 r5:ecb6c000 r4:ecef9b40
[64167.344498] [<bf0647d0>] (cfg80211_process_wdev_events [cfg80211]) from [<bf06498c>] (cfg80211_process_rdev_events+0x48/0xa0 [cfg80211])
[64167.344536]  r10:00000000 r9:00000040 r8:00000000 r7:eefa8500 r6:ee806600 r5:ecc3c478
[64167.344554]  r4:ecb6c000
[64167.344977] [<bf064944>] (cfg80211_process_rdev_events [cfg80211]) from [<bf05d1b8>] (cfg80211_event_work+0x24/0x2c [cfg80211])
[64167.345002]  r5:ec177e00 r4:ecc3c0e4
[64167.345246] [<bf05d194>] (cfg80211_event_work [cfg80211]) from [<c015b044>] (process_one_work+0x1c8/0x564)
[64167.345270]  r5:ec177e00 r4:ecc3c0e4
[64167.345307] [<c015ae7c>] (process_one_work) from [<c015b800>] (worker_thread+0x58/0x544)
[64167.345344]  r10:eea4c000 r9:ee806618 r8:c1404d00 r7:00000088 r6:ee806600 r5:ec177e14
[64167.345363]  r4:ec177e00
[64167.345403] [<c015b7a8>] (worker_thread) from [<c01631dc>] (kthread+0x170/0x1b0)
[64167.345439]  r10:eed7fe74 r9:c015b7a8 r8:ec177e00 r7:00000000 r6:ed8999c0 r5:eea4c000
[64167.345459]  r4:ed890600
[64167.345493] [<c016306c>] (kthread) from [<c01010e8>] (ret_from_fork+0x14/0x2c)
[64167.345516] Exception stack(0xeea4dfb0 to 0xeea4dff8)
[64167.345547] dfa0:                                     00000000 00000000 00000000 00000000
[64167.345583] dfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[64167.345614] dfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[64167.345649]  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c016306c
[64167.345668]  r4:ed8999c0
[64167.361733] ---[ end trace 6e91afbeae861295 ]---
pbrkr commented 3 years ago

Hi @nahuel, I'm working with Sancloud on Linux support for the BBE so I'm happy to help here if I can. Is the kernel & rootfs here from Buildroot, Yocto Project, Debian or somewhere else?

Could you check if CONFIG_MUSB_PIO_ONLY is enabled in the kernel config? If it's not enabled, try enabling that and see if it resolves the issue. There are known issues with stability of DMA over the USB interface.

If you need further assistance feel free to email me at paul.barker@sancloud.com.

nahuel commented 3 years ago

Hi @pbrkr , I saw this problem on a very modified Debian with stock kernel, who has enabled CONFIG_MUSB_PIO_ONLY=y . After the original post, I made two more tests:

1- Copied the entire OS image to another BBE board => the problem is still here, so is not a hardware failure of the first board.

2- Tried to reproduce it using another BBE board but using the latest IoT Debian image, but can't make it fail again. It appears to work ok (but with an initial slowpath warning/backtrace on dmesg).

Next days will continue to try reproduce this with vanilla IoT Debian.

lwfinger commented 3 years ago

What was the slowpath warning?

nahuel commented 3 years ago

@lwfinger this warning shows with vanilla Debian (downloaded from https://rcn-ee.com/rootfs/bb.org/testing/2020-05-02/stretch-console/bone-eMMC-flasher-debian-9.12-console-armhf-2020-05-02-1gb.img.xz ) on first connection, but then no more backtraces are shown and the connection seems stable (until now):

[   52.211444] 8723bu: loading out-of-tree module taints kernel.
[   53.024947] RTL871X: module init start
[   53.024971] RTL871X: rtl8723bu v4.3.6.11_12942.20141204_BTCOEX20140507-4E40
[   53.024978] RTL871X: rtl8723bu BT-Coex version = BTCOEX20140507-4E40
[   54.299577] RTL871X: rtw_ndev_init(wlan0)
[   54.301183] RTL871X: rtw_ndev_init(wlan1)
[   54.360413] usbcore: registered new interface driver rtl8723bu
[   54.360432] RTL871X: module init ret=0
[   68.893025] mmc0: host does not support reading read-only switch, assuming write-enable
[   68.895358] mmc0: new high speed SDHC card at address 5048
[   68.906978] mmcblk0: mmc0:5048 SD16G 14.4 GiB
[   68.909205]  mmcblk0: p1
[  146.512881] RTL871X: RTW_ADAPTIVITY_EN_
[  146.512909] AUTO, chplan:0x20, Regulation:3,3
[  146.512919] RTL871X: RTW_ADAPTIVITY_MODE_
[  146.512924] NORMAL
[  147.138385] IPv6: ADDRCONF(NETDEV_UP): wlan0: link is not ready
[  147.165166] RTL871X: set ssid [ENFORCER-022-AP] fw_state=0x00000008
[  149.256504] RTL871X: start auth
[  149.266580] RTL871X: auth success, start assoc
[  149.278627] RTL871X: rtw_cfg80211_indicate_connect(wlan0) BSS not found !!
[  149.278672] RTL871X: assoc success
[  149.278821] ------------[ cut here ]------------
[  149.279416] WARNING: CPU: 0 PID: 80 at net/wireless/sme.c:752 __cfg80211_connect_result+0x3c8/0x418 [cfg80211]
[  149.279426] Modules linked in: 8723bu(O) cfg80211 ecdh_generic spidev uio_pdrv_genirq uio usb_f_ncm u_ether libcomposite
[  149.279491] CPU: 0 PID: 80 Comm: kworker/u2:1 Tainted: G           O    4.14.108-ti-r134 #1stretch
[  149.279497] Hardware name: Generic AM33XX (Flattened Device Tree)
[  149.279765] Workqueue: cfg80211 cfg80211_event_work [cfg80211]
[  149.279817] [<c0112ad8>] (unwind_backtrace) from [<c010d690>] (show_stack+0x20/0x24)
[  149.279841] [<c010d690>] (show_stack) from [<c0cbba94>] (dump_stack+0x80/0x94)
[  149.279857] [<c0cbba94>] (dump_stack) from [<c013ebe0>] (__warn+0xf8/0x110)
[  149.279870] [<c013ebe0>] (__warn) from [<c013ed10>] (warn_slowpath_null+0x30/0x38)
[  149.280073] [<c013ed10>] (warn_slowpath_null) from [<bf0cb210>] (__cfg80211_connect_result+0x3c8/0x418 [cfg80211])
[  149.280392] [<bf0cb210>] (__cfg80211_connect_result [cfg80211]) from [<bf099b20>] (cfg80211_process_wdev_events+0x118/0x174 [cfg80211])
[  149.280675] [<bf099b20>] (cfg80211_process_wdev_events [cfg80211]) from [<bf099bc0>] (cfg80211_process_rdev_events+0x44/0x78 [cfg80211])
[  149.280949] [<bf099bc0>] (cfg80211_process_rdev_events [cfg80211]) from [<bf093334>] (cfg80211_event_work+0x24/0x2c [cfg80211])
[  149.281098] [<bf093334>] (cfg80211_event_work [cfg80211]) from [<c015c838>] (process_one_work+0x19c/0x518)
[  149.281113] [<c015c838>] (process_one_work) from [<c015d770>] (worker_thread+0x60/0x540)
[  149.281130] [<c015d770>] (worker_thread) from [<c0163380>] (kthread+0x144/0x174)
[  149.281150] [<c0163380>] (kthread) from [<c0108e28>] (ret_from_fork+0x14/0x2c)
[  149.281159] ---[ end trace 12fcb36565bd9c1a ]---
[  149.281311] IPv6: ADDRCONF(NETDEV_CHANGE): wlan0: link becomes ready
nahuel commented 3 years ago

I found the problem. In the customized image there was a crontab script who did an ifdown wlan0 ; sleep 5 ; ifup wlan0 when a ping -c1 to a gateway failed (and it always failed on heavy traffic). The backtrace I saw recurrently in dmesg is the one who is always shown on connect, but was not the cause of the disconnection (the script was).

So, I think you can close this issue. The only question left is why this warning appears on connection, but seems harmless. Thanks for your support.

lwfinger commented 3 years ago

That warning is intended to fire once when there is no routine attached to the connect callback in struct cfg80211_ops; however, it is established at line 5601 of file os_dep/ioctl_cfg80211.c. It does not occur on my kernel 5.9.1 running on x86_64.

At least it does not interfere with operation.