kuba-moo / mt7601u

Linux mac80211-based driver for Mediatek MT7601U USB bgn WiFi dongle
285 stars 105 forks source link

Device EEPROM causing kernel error within mt7601u_set_power_rate function #94

Open BioDiode opened 1 year ago

BioDiode commented 1 year ago

I recently purchased a few different mt7601u-based WiFi modules from Amazon and a couple of them are causing repeated kernel errors on USB insertion with this driver:

[   43.380628] usb 1-1: new high-speed USB device number 2 using xhci-hcd
[   43.828741] usb 1-1: reset high-speed USB device number 2 using xhci-hcd
[   43.980338] mt7601u 1-1:1.0: ASIC revision: 76010001 MAC revision: 76010500
[   43.985582] mt7601u 1-1:1.0: Firmware Version: 0.1.00 Build: 7640 Build time: 201302052146____
[   44.406664] mt7601u 1-1:1.0: EEPROM ver:0c fae:00
[   44.407057] ------------[ cut here ]------------
[   44.407083] WARNING: CPU: 0 PID: 94 at drivers/net/wireless/mediatek/mt7601u/eeprom.h:126 mt7601u_set_power_rate+0x3c/0x50
[   44.407089] Modules linked in: apex(C) wlan(O) crc32_ce gasket(C) crct10dif_ce galcore(O) ina2xx imx_sdma ip_tables x_tables
[   44.407145] CPU: 0 PID: 94 Comm: kworker/0:1 Tainted: G         C O    4.14.98-imx #1
[   44.407151] Hardware name: Freescale i.MX8MQ Phanbell (DT)
[   44.407164] Workqueue: usb_hub_wq hub_event
[   44.407175] task: ffff80007659de80 task.stack: ffff000009d00000
[   44.407184] PC is at mt7601u_set_power_rate+0x3c/0x50
[   44.407193] LR is at mt7601u_eeprom_init+0x540/0x6a8
[   44.407201] pc : [<ffff000008864034>] lr : [<ffff000008864688>] pstate: 00000145
[   44.407206] sp : ffff000009d036b0
[   44.407212] x29: ffff000009d036b0 x28: ffff8000766fd800 
[   44.407226] x27: 0000000000000000 x26: 0000000000001314 
[   44.407239] x25: ffff800075bb20de x24: 0000000000000000 
[   44.407252] x23: ffff800075bb2000 x22: 0000000000000002 
[   44.407265] x21: 0000000000000000 x20: 0000000000000000 
[   44.407277] x19: ffff800075ad3560 x18: ffff0000094f3000 
[   44.407290] x17: 0000000000000000 x16: 0000000000000000 
[   44.407303] x15: 00000000fffffff0 x14: 0000000000000004 
[   44.407315] x13: 071c71c71c71c71c x12: 0000000a2e12dc24 
[   44.407328] x11: 0000000000000000 x10: 0000000000000980 
[   44.407340] x9 : ffff000009d03370 x8 : ffff80007659e860 
[   44.407353] x7 : 0000000000000000 x6 : 0000000000000000 
[   44.407365] x5 : ffff80007529f798 x4 : 00000000f4f40000 
[   44.407378] x3 : 0000000000000034 x2 : 00000000000000f4 
[   44.407390] x1 : 0000000000000002 x0 : ffff80007529f7b2 
[   44.407402] Call trace:
[   44.407412] Exception stack(0xffff000009d03570 to 0xffff000009d036b0)
[   44.407421] 3560:                                   ffff80007529f7b2 0000000000000002
[   44.407432] 3580: 00000000000000f4 0000000000000034 00000000f4f40000 ffff80007529f798
[   44.407442] 35a0: 0000000000000000 0000000000000000 ffff80007659e860 ffff000009d03370
[   44.407453] 35c0: 0000000000000980 0000000000000000 0000000a2e12dc24 071c71c71c71c71c
[   44.407463] 35e0: 0000000000000004 00000000fffffff0 0000000000000000 0000000000000000
[   44.407474] 3600: ffff0000094f3000 ffff800075ad3560 0000000000000000 0000000000000000
[   44.407485] 3620: 0000000000000002 ffff800075bb2000 0000000000000000 ffff800075bb20de
[   44.407495] 3640: 0000000000001314 0000000000000000 ffff8000766fd800 ffff000009d036b0
[   44.407506] 3660: ffff000008864688 ffff000009d036b0 ffff000008864034 0000000000000145
[   44.407517] 3680: ffff000009d036b0 ffff000008864364 ffffffffffffffff 000000000000ff11
[   44.407526] 36a0: ffff000009d036b0 ffff000008864034
[   44.407537] [<ffff000008864034>] mt7601u_set_power_rate+0x3c/0x50
[   44.407551] [<ffff000008861410>] mt7601u_init_hardware+0x3e0/0x468
[   44.407563] [<ffff0000088608e8>] mt7601u_probe+0x1e0/0x238
[   44.407576] [<ffff00000887e5e8>] usb_probe_interface+0xe8/0x288
[   44.407589] [<ffff0000086b0efc>] driver_probe_device+0x204/0x2c0
[   44.407600] [<ffff0000086b1130>] __device_attach_driver+0xb8/0xe8
[   44.407610] [<ffff0000086af220>] bus_for_each_drv+0x68/0xa8
[   44.407620] [<ffff0000086b0bd8>] __device_attach+0xc0/0x130
[   44.407630] [<ffff0000086b11b0>] device_initial_probe+0x10/0x18
[   44.407640] [<ffff0000086b00e8>] bus_probe_device+0x90/0x98
[   44.407650] [<ffff0000086ae130>] device_add+0x328/0x5b0
[   44.407661] [<ffff00000887c500>] usb_set_configuration+0x410/0x7c8
[   44.407672] [<ffff0000088883f0>] generic_probe+0x58/0x80
[   44.407683] [<ffff00000887e4d8>] usb_probe_device+0x28/0x50
[   44.407694] [<ffff0000086b0efc>] driver_probe_device+0x204/0x2c0
[   44.407704] [<ffff0000086b1130>] __device_attach_driver+0xb8/0xe8
[   44.407714] [<ffff0000086af220>] bus_for_each_drv+0x68/0xa8
[   44.407724] [<ffff0000086b0bd8>] __device_attach+0xc0/0x130
[   44.407734] [<ffff0000086b11b0>] device_initial_probe+0x10/0x18
[   44.407744] [<ffff0000086b00e8>] bus_probe_device+0x90/0x98
[   44.407753] [<ffff0000086ae130>] device_add+0x328/0x5b0
[   44.407762] [<ffff000008872f84>] usb_new_device+0x2f4/0x700
[   44.407772] [<ffff0000088741b4>] hub_event+0x814/0x1038
[   44.407784] [<ffff0000080e4a60>] process_one_work+0x1c8/0x328
[   44.407794] [<ffff0000080e4c04>] worker_thread+0x44/0x450
[   44.407807] [<ffff0000080eab58>] kthread+0x128/0x130
[   44.407819] [<ffff000008084e08>] ret_from_fork+0x10/0x18
[   44.407826] ---[ end trace 78b6dcdaf9891842 ]---

This stack trace usually repeats itself a dozen times before the device eventually works:

[   44.669963] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
[   44.889263] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   44.925808] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   45.149637] IPv6: ADDRCONF(NETDEV_UP): wlan1: link is not ready
[   46.515589] wlan1: authenticate with b4:b0:24:ea:4c:ae
[   46.536970] wlan1: send auth to b4:b0:24:ea:4c:ae (try 1/3)
[   46.539683] wlan1: authenticated
[   46.548127] wlan1: associate with b4:b0:24:ea:4c:ae (try 1/3)
[   46.559053] wlan1: RX AssocResp from b4:b0:24:ea:4c:ae (capab=0x411 status=0 aid=4)
[   46.603599] wlan1: associated
[   46.674525] IPv6: ADDRCONF(NETDEV_CHANGE): wlan1: link becomes ready

I believe the issue may be relating to the s6_validate macro called from within the mt7601u_set_power_rate function. Here is the eeprom_param of a device (with reported firmware version 0c) that produces the above error:

RF freq offset: 5d
RSSI offset: 0 0
Reference temp: f9
LNA gain: 0
Reg channels: 1-14
Per rate power:
         raw:00 bw20:00 bw40:02
         raw:00 bw20:00 bw40:02
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
         raw:34 bw20:f4 bw40:f6
Per channel power:
         tx_power  ch1:04 ch2:04
         tx_power  ch3:04 ch4:04
         tx_power  ch5:04 ch6:04
         tx_power  ch7:04 ch8:04
         tx_power  ch9:05 ch10:05
         tx_power  ch11:05 ch12:05
         tx_power  ch13:05 ch14:05

Are these values unexpected? The eeprom_param instead looks like this on a device (with firmware version 0d) that doesn't produce the kernel error:

RF freq offset: 60
RSSI offset: 0 0
Reference temp: f9
LNA gain: 8
Reg channels: 1-14
Per rate power:
         raw:07 bw20:07 bw40:07
         raw:07 bw20:07 bw40:07
         raw:03 bw20:03 bw40:03
         raw:03 bw20:03 bw40:03
         raw:04 bw20:04 bw40:04
         raw:00 bw20:00 bw40:00
         raw:00 bw20:00 bw40:00
         raw:00 bw20:00 bw40:00
         raw:02 bw20:02 bw40:02
         raw:00 bw20:00 bw40:00
Per channel power:
         tx_power  ch1:0c ch2:0b
         tx_power  ch3:0b ch4:0d
         tx_power  ch5:0c ch6:0d
         tx_power  ch7:0f ch8:0f
         tx_power  ch9:0f ch10:0f
         tx_power  ch11:0f ch12:0f
         tx_power  ch13:0f ch14:0f

These errors are consistent in both kernel 4.14.98 and 5.19.0, so appear to be hardware-related.

The two different devices that reliably produce this kernel error are here (similar eeprom_param and firmware version 0c): https://www.amazon.com/dp/B008Z9IZSW https://www.amazon.com/dp/B0BNFKJPXS

The device that works without issue is here (different eeprom_param and firmware version 0d): https://www.amazon.com/dp/B00RBBUQLE

Any thoughts on a fix or workaround for the problematic devices? Is there any harm in simply commenting out the WARN_ON(reg & ~GENMASK(5, 0)) line in the s6_validate macro or is there a better solution?

roqueeee commented 11 months ago

I got the same issue with a mt7601u-based WiFi usb stick that is on EEPROM ver:0c. After going through these error messages the stick eventually works just fine. I would still like to get rid of these messages though.

Did you find a solution for this problem by any chance?

FYI, this repo appears to be dead: "Please report any issues upstream to the Linux Wireless community. This repo is no longer used." Maybe this is the right way to file a bug report.

BioDiode commented 11 months ago

Yes, I fixed the issue by recompiling the mt7601u kernel driver with the following line of eeprom.c commented out: // mt7601u_config_tx_power_per_rate(dev, eeprom);

In 4.14.98, this is line 412: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/mediatek/mt7601u/eeprom.c?h=v4.14.98#n412

In 5.19.0, this is line 384: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/wireless/mediatek/mt7601u/eeprom.c?h=v5.19#n384

This prevents the kernel error and worked in my case, but there is undoubtedly a more elegant way to handle the issue without completely disabling the call to the config_tx_power_per_rate function. Instead, you could try commenting out the following line in eeprom.h to disable the kernel warning as mentioned in my original post: // WARN_ON(reg & ~GENMASK(5, 0));

The issue was submitted to the linux-wireless mailing list, but I didn't have time to submit or test a proper patch: https://marc.info/?l=linux-wireless&m=169049931212694