kuba-moo / mt7601u

Linux mac80211-based driver for Mediatek MT7601U USB bgn WiFi dongle
285 stars 105 forks source link

Kernel shows up errors on increased traffic #25

Closed rethil closed 9 years ago

rethil commented 9 years ago

There are a lot of kernel messages while stressing out mt7601u device. My setup has 2 usb wifi dongles:

  1. RTL8188eu compatible, acting as an AP with hostapd
  2. MT7601u compatible connected to external network. It also got iptables that exchange packets between these 2 interfaces and drops incomming packets on MT7601u.

I was able to capture few backtraces. Errors showed up while downloading/uploading bigger ammount of data.

[  401.996127] ------------[ cut here ]------------
[  401.996320] WARNING: CPU: 0 PID: 0 at kernel/softirq.c:150 __local_bh_enable_ip+0xe4/0x13c()
[  401.996493] Modules linked in: mt7601u(O)
[  401.996629] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-v7+ #7
[  401.996780] Hardware name: BCM2709
[  401.996900] [<80016618>] (unwind_backtrace) from [<80012c1c>] (show_stack+0x20/0x24)
[  401.997082] [<80012c1c>] (show_stack) from [<80823694>] (dump_stack+0x9c/0xdc)
[  401.997253] [<80823694>] (dump_stack) from [<8002b754>] (warn_slowpath_common+0x98/0xc8)
[  401.997436] [<8002b754>] (warn_slowpath_common) from [<8002b840>] (warn_slowpath_null+0x2c/0x34)
[  401.997625] [<8002b840>] (warn_slowpath_null) from [<8002f560>] (__local_bh_enable_ip+0xe4/0x13c)
[  401.997823] [<8002f560>] (__local_bh_enable_ip) from [<805dae6c>] (destroy_conntrack+0x8c/0xd8)
[  401.998014] [<805dae6c>] (destroy_conntrack) from [<805d41d0>] (nf_conntrack_destroy+0x38/0x44)
[  401.998206] [<805d41d0>] (nf_conntrack_destroy) from [<8057e910>] (skb_release_head_state+0xcc/0xfc)
[  401.998405] [<8057e910>] (skb_release_head_state) from [<8057ebc0>] (skb_release_all+0x1c/0x34)
[  401.998598] [<8057ebc0>] (skb_release_all) from [<8057ec38>] (consume_skb+0x38/0xf0)
[  401.998770] [<8057ec38>] (consume_skb) from [<807ce508>] (ieee80211_tx_status+0xad4/0xd94)
[  401.998987] [<807ce508>] (ieee80211_tx_status) from [<7f00a66c>] (mt7601u_tx_status+0xac/0xb0 [mt7601u])
[  401.999216] [<7f00a66c>] (mt7601u_tx_status [mt7601u]) from [<7f005200>] (mt7601u_complete_tx+0x94/0x1f4 [mt7601u])
[  401.999460] [<7f005200>] (mt7601u_complete_tx [mt7601u]) from [<80431008>] (__usb_hcd_giveback_urb+0x70/0x12c)
[  401.999674] [<80431008>] (__usb_hcd_giveback_urb) from [<80431114>] (usb_hcd_giveback_urb+0x50/0x128)
[  401.999875] [<80431114>] (usb_hcd_giveback_urb) from [<8045f828>] (completion_tasklet_func+0x6c/0x98)
[  402.000080] [<8045f828>] (completion_tasklet_func) from [<8046e2ac>] (tasklet_callback+0x20/0x24)
[  402.000275] [<8046e2ac>] (tasklet_callback) from [<8002fba4>] (tasklet_hi_action+0x74/0x10c)
[  402.000461] [<8002fba4>] (tasklet_hi_action) from [<8002f10c>] (__do_softirq+0x19c/0x3f0)
[  402.000640] [<8002f10c>] (__do_softirq) from [<8002f728>] (irq_exit+0xdc/0x140)
[  402.000808] [<8002f728>] (irq_exit) from [<80071760>] (__handle_domain_irq+0x98/0xec)
[  402.000984] [<80071760>] (__handle_domain_irq) from [<8000857c>] (asm_do_IRQ+0x2c/0x30)
[  402.001162] [<8000857c>] (asm_do_IRQ) from [<8082aff4>] (__irq_svc+0x34/0x14c)
[  402.001312] Exception stack(0x80b8bf20 to 0x80b8bf68)
[  402.001436] bf20: 00000000 00000000 00000000 00000000 80b8a000 80baaa18 80baa940 ffffffff
[  402.001611] bf40: 80bfa35c 8083153c 80baa9b8 80b8bf74 80bab938 80b8bf68 8000f808 8000f80c
[  402.001779] bf60: 600b0013 ffffffff
[  402.001878] [<8082aff4>] (__irq_svc) from [<8000f80c>] (arch_cpu_idle+0x34/0x4c)
[  402.002049] [<8000f80c>] (arch_cpu_idle) from [<8006500c>] (cpu_startup_entry+0x21c/0x294)
[  402.002236] [<8006500c>] (cpu_startup_entry) from [<8082182c>] (rest_init+0x94/0x98)
[  402.002408] [<8082182c>] (rest_init) from [<80b08cc8>] (start_kernel+0x37c/0x388)
[  402.002566] ---[ end trace b7587ce329698bea ]---

Jun 19 23:45:07 raspi kernel: ------------[ cut here ]------------
Jun 19 23:45:08 raspi kernel: WARNING: CPU: 0 PID: 3 at net/mac80211/util.c:328 __ieee80211_wake_queue+0x24c/0x480()
Jun 19 23:45:08 raspi kernel: Modules linked in: mt7601u(O)
Jun 19 23:45:08 raspi kernel: CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G        W  O    4.0.5-v7+ #8
Jun 19 23:45:08 raspi kernel: Hardware name: BCM2709
Jun 19 23:45:08 raspi kernel: [<80017164>] (unwind_backtrace) from [<80013128>] (show_stack+0x20/0x24)
Jun 19 23:45:08 raspi kernel: [<80013128>] (show_stack) from [<80976b2c>] (dump_stack+0x9c/0xdc)
Jun 19 23:45:08 raspi kernel: [<80976b2c>] (dump_stack) from [<8002d240>] (warn_slowpath_common+0x98/0xc8)
Jun 19 23:45:08 raspi kernel: [<8002d240>] (warn_slowpath_common) from [<8002d32c>] (warn_slowpath_null+0x2c/0x34)
Jun 19 23:45:08 raspi kernel: [<8002d32c>] (warn_slowpath_null) from [<8093aab8>] (__ieee80211_wake_queue+0x24c/0x480)
Jun 19 23:45:08 raspi kernel: [<8093aab8>] (__ieee80211_wake_queue) from [<8093ad34>] (ieee80211_wake_queue_by_reason+0x48/0x58)
Jun 19 23:45:08 raspi kernel: [<8093ad34>] (ieee80211_wake_queue_by_reason) from [<8093ad64>] (ieee80211_wake_queue+0x20/0x24)
Jun 19 23:45:08 raspi kernel: [<8093ad64>] (ieee80211_wake_queue) from [<7f0063e8>] (mt7601u_complete_tx+0x2ec/0x408 [mt7601u])
Jun 19 23:45:08 raspi kernel: [<7f0063e8>] (mt7601u_complete_tx [mt7601u]) from [<804c2858>] (__usb_hcd_giveback_urb+0x70/0x12c)
Jun 19 23:45:08 raspi kernel: [<804c2858>] (__usb_hcd_giveback_urb) from [<804c2964>] (usb_hcd_giveback_urb+0x50/0x110)
Jun 19 23:45:08 raspi kernel: [<804c2964>] (usb_hcd_giveback_urb) from [<804f14b0>] (completion_tasklet_func+0x6c/0x98)
Jun 19 23:45:08 raspi kernel: [<804f14b0>] (completion_tasklet_func) from [<804fffac>] (tasklet_callback+0x20/0x24)
Jun 19 23:45:08 raspi kernel: [<804fffac>] (tasklet_callback) from [<80032ec0>] (tasklet_hi_action+0x74/0x10c)
Jun 19 23:45:08 raspi kernel: [<80032ec0>] (tasklet_hi_action) from [<80031eb8>] (__do_softirq+0x160/0x75c)
Jun 19 23:45:08 raspi kernel: [<80031eb8>] (__do_softirq) from [<800324f8>] (run_ksoftirqd+0x44/0x80)
Jun 19 23:45:08 raspi kernel: [<800324f8>] (run_ksoftirqd) from [<80055a4c>] (smpboot_thread_fn+0x190/0x30c)
Jun 19 23:45:08 raspi kernel: [<80055a4c>] (smpboot_thread_fn) from [<800510dc>] (kthread+0x100/0x118)
Jun 19 23:45:08 raspi kernel: [<800510dc>] (kthread) from [<8000ee38>] (ret_from_fork+0x14/0x3c)
Jun 19 23:45:08 raspi kernel: ---[ end trace e43bb9c0b6e99382 ]---

Jun 18 00:08:22 raspi kernel: ------------[ cut here ]------------
Jun 18 00:08:22 raspi kernel: WARNING: CPU: 0 PID: 0 at /usr/src/mt7601u/dma.c:198 mt7601u_complete_rx+0xf4/0x100 [mt7601u]()
Jun 18 00:08:22 raspi kernel: RX urb mismatch
Jun 18 00:08:22 raspi kernel: Modules linked in:
Jun 18 00:08:22 raspi kernel:  mt7601u(O)
Jun 18 00:08:22 raspi kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O    4.0.5-v7+ #5
Jun 18 00:08:22 raspi kernel: Hardware name: BCM2709
Jun 18 00:08:22 raspi kernel: [<80016630>] (unwind_backtrace) from [<80012c1c>] (show_stack+0x20/0x24)
Jun 18 00:08:22 raspi kernel: [<80012c1c>] (show_stack) from [<8082117c>] (dump_stack+0x9c/0xdc)
Jun 18 00:08:22 raspi kernel: [<8082117c>] (dump_stack) from [<8002b70c>] (warn_slowpath_common+0x98/0xc8)
Jun 18 00:08:22 raspi kernel: [<8002b70c>] (warn_slowpath_common) from [<8002b77c>] (warn_slowpath_fmt+0x40/0x48)
Jun 18 00:08:22 raspi kernel: [<8002b77c>] (warn_slowpath_fmt) from [<7f005a74>] (mt7601u_complete_rx+0xf4/0x100 [mt7601u])
Jun 18 00:08:22 raspi kernel: [<7f005a74>] (mt7601u_complete_rx [mt7601u]) from [<8042eefc>] (__usb_hcd_giveback_urb+0x70/0x12c)
Jun 18 00:08:22 raspi kernel: [<8042eefc>] (__usb_hcd_giveback_urb) from [<8042f008>] (usb_hcd_giveback_urb+0x50/0x128)
Jun 18 00:08:22 raspi kernel: [<8042f008>] (usb_hcd_giveback_urb) from [<8045d710>] (completion_tasklet_func+0x6c/0x98)
Jun 18 00:08:22 raspi kernel: [<8045d710>] (completion_tasklet_func) from [<8046c194>] (tasklet_callback+0x20/0x24)
Jun 18 00:08:22 raspi kernel: [<8046c194>] (tasklet_callback) from [<8002fb4c>] (tasklet_hi_action+0x74/0x10c)
Jun 18 00:08:22 raspi kernel: [<8002fb4c>] (tasklet_hi_action) from [<8002f0b4>] (__do_softirq+0x19c/0x3f0)
Jun 18 00:08:22 raspi kernel: [<8002f0b4>] (__do_softirq) from [<8002f6d0>] (irq_exit+0xdc/0x140)
Jun 18 00:08:22 raspi kernel: [<8002f6d0>] (irq_exit) from [<800716a0>] (__handle_domain_irq+0x98/0xec)
Jun 18 00:08:22 raspi kernel: [<800716a0>] (__handle_domain_irq) from [<8000857c>] (asm_do_IRQ+0x2c/0x30)
Jun 18 00:08:22 raspi kernel: [<8000857c>] (asm_do_IRQ) from [<80828ab4>] (__irq_svc+0x34/0x14c)
Jun 18 00:08:22 raspi kernel: Exception stack(0x80b87f20 to 0x80b87f68)
Jun 18 00:08:22 raspi kernel: 7f20: 00000000 00000000 00000000 00000000 80b86000 80ba6a18 80ba6940 ffffffff
Jun 18 00:08:22 raspi kernel: 7f40: 80bf60e4 8082e53c 80ba69b8 80b87f74 80ba791c 80b87f68 8000f808 8000f80c
Jun 18 00:08:22 raspi kernel: 7f60: 600c0013 ffffffff
Jun 18 00:08:22 raspi kernel: [<80828ab4>] (__irq_svc) from [<8000f80c>] (arch_cpu_idle+0x34/0x4c)
Jun 18 00:08:22 raspi kernel: [<8000f80c>] (arch_cpu_idle) from [<80064f9c>] (cpu_startup_entry+0x21c/0x294)
Jun 18 00:08:22 raspi kernel: [<80064f9c>] (cpu_startup_entry) from [<8081f04c>] (rest_init+0x94/0x98)
Jun 18 00:08:22 raspi kernel: [<8081f04c>] (rest_init) from [<80b04cc8>] (start_kernel+0x37c/0x388)
Jun 18 00:08:22 raspi kernel: ---[ end trace 1d6cf23d516f91d4 ]---
Jun 18 00:08:22 raspi kernel: mt7601u 1-1.2:1.0: Warning: mt7601u_mcu_wait_resp retrying
Jun 18 00:08:23 raspi kernel: mt7601u 1-1.2:1.0: Warning: mt7601u_mcu_wait_resp retrying
Jun 18 00:08:23 raspi kernel: mt7601u 1-1.2:1.0: Warning: mt7601u_mcu_wait_resp retrying
Jun 18 00:08:23 raspi kernel: mt7601u 1-1.2:1.0: Warning: mt7601u_mcu_wait_resp retrying
Jun 18 00:08:24 raspi kernel: mt7601u 1-1.2:1.0: Warning: mt7601u_mcu_wait_resp retrying
Jun 18 00:08:24 raspi kernel: mt7601u 1-1.2:1.0: Error: mt7601u_mcu_wait_resp timed out
Jun 18 00:08:24 raspi kernel: mt7601u 1-1.2:1.0: Warning: MCU TSSI read kick failed
Jun 18 00:08:27 raspi kernel: mt7601u 1-1.2:1.0: Error: send MCU cmd failed:-110
Jun 18 00:08:29 raspi kernel: mt7601u 1-1.2:1.0: mt7601u_rxdc_cal intro failed:-110
Jun 18 00:08:29 raspi kernel: mt7601u 1-1.2:1.0: Error: send MCU cmd failed:-110
Jun 18 00:08:29 raspi kernel: mt7601u 1-1.2:1.0: mt7601u_rxdc_cal outro failed:-110
Jun 18 00:08:30 raspi kernel: mt7601u 1-1.2:1.0: Error: send MCU cmd failed:-110

Additionally when I turned on some kernel debug features this showed up at somepoint: http://pastebin.com/V591G8Fg

kuba-moo commented 9 years ago

This looks more serious indeed. Most of the stuff at pastebin comes from realtek driver, though. I hope it's not that driver which is causing problems. I will take a closer look at this once I'm done with the other issue you reported ;) Would it be a problem for you to upgrade your kernel to v4.1 or even https://git.kernel.org/cgit/linux/kernel/git/kvalo/wireless-drivers-next.git/ and see if those errors occur again? It's not super necessary but may make things easier for me down the road.

rethil commented 9 years ago

I'll upgrade my kernel to v4.1 this evening and see if I can see these errors again.

kuba-moo commented 9 years ago

OK, the first one is fixed in testing branch as well. The upgrade is not 100% necessary so if you don't feel like doing it, you can keep rollin' with 4.0.5.

I'm trying to reproduce the second one with no luck so far.

Third one is complete hung of the card. Little chance I will be able to fix that one (as it would probably require MediaTek to change the firmware).

rethil commented 9 years ago

I was unable to upgrade my system with 4.1 kernel. Fix for tx/rx seems to be doing it job on v4.0.5. Second error was easly reproducable as far as I remember. I'll try to reproduce it with testing branch. For 3rd: it only happened once while running for 1 night straight.

kuba-moo commented 9 years ago

Actually the initial fix for first issue which I pushed earlier killed performance of TCP from the host (<5Mbps). I just updated the testing branch to fix that. Can you try that? Let me know if you see any issues. Your help is much appreciated :)

As for the second and third problem I created a setup with NAT (ethernet <-> mt7601u) but no luck reproducing them so far...

rethil commented 9 years ago

Tried out latest commit: there are some kernel messages popping out. I just simply ran: wget -O /dev/null http://speedtest.wdc01.softlayer.com/downloads/test100.zip

[   32.327978] mt7601u 1-1.2:1.0 wlan1: disabling HT/VHT due to WEP/TKIP use
[   32.328253] mt7601u 1-1.2:1.0 wlan1: disabling HT as WMM/QoS is not supported by the AP
[   32.328580] mt7601u 1-1.2:1.0 wlan1: disabling VHT as WMM/QoS is not supported by the AP
[  131.699802] wget (307) used greatest stack depth: 2768 bytes left
[  163.345903] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  163.701233] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  164.076691] mt7601u 1-1.2:1.0: Error: RF read 00:04 failed:-110!!
[  172.337635] mt7601u 1-1.2:1.0: Error: BBP read 2f failed:-110!!
[  180.589873] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  193.183838] mt7601u 1-1.2:1.0: Error: BBP read 42 failed:-110!!
[  197.456808] mt7601u 1-1.2:1.0: Error: BBP read 42 failed:-110!!
[  201.746887] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  210.236258] mt7601u 1-1.2:1.0: Error: BBP read 42 failed:-110!!
[  239.433631] mt7601u 1-1.2:1.0: Error: BBP read 2f failed:-110!!
[  247.921059] mt7601u 1-1.2:1.0: Error: BBP read b2 failed:-110!!
[  260.310126] mt7601u 1-1.2:1.0: Error: BBP read 42 failed:-110!!
[  264.656379] mt7601u 1-1.2:1.0: Error: BBP read 42 failed:-110!!
[  273.006451] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  277.236589] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  277.348404] mt7601u 1-1.2:1.0: Error: BBP read 01 failed:-110!!
[  302.577326] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  310.919364] mt7601u 1-1.2:1.0: Error: BBP read 31 failed:-110!!
[  326.488685] wget (310) used greatest stack depth: 2112 bytes left

Traffic was getting lower when each of error message was showing up.

rethil commented 9 years ago

I can also see a traffic loss between 2223538 and e5dc358. 2223538 - ~1mb/s constant traffic e5dc358 - ~0.5-0.8mb/s and usually dropping to 10kb/s

kuba-moo commented 9 years ago

I was able to reproduce the timeouts (-110 errors). I will try to reproduce the traffic drop between 2223538 and e5dc358 in my setup as well... but tomorrow. I have enough debugging for one day ;)

rethil commented 9 years ago

Looks like rtl8188 were caused by hostapd. I was using https://github.com/jenssegers/RTL8188-hostapd. Now I switched to hostapd 2.4 patched with https://github.com/pritambaral/hostapd-rtl871xdrv and it's more stable. Kernel still shows error messages not related to mt7601u. You can get that issue of the list. 2nd issue: I'm also not able to reproduce it now.

kuba-moo commented 9 years ago

OK @Rethil! Big thanks for report and help, I think doing proper locking fixed the -110 errors. I've been running mt7601u in NAT configuration for almost 20 hours with no hickups/errors.

Moreover I don't see any performance issues. master and testing run at approximately the same speed for me. Let me know if speed regression persists in your configuration. Beware that speed depends on many factors like noise (other users of 2.4GHz band), temperature of the device, placement of objects around the devices, weather you are standing next to it or not...

kuba-moo commented 9 years ago

I pushed the fixes to master branch. Feel free to open a new issue if you have more problems.

rethil commented 9 years ago

Great, thanks a lot ;). Everything seems to be working fine with dirver from testing branch. If anything I spot something bad I'll let you know :+1: