openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
747 stars 342 forks source link

mt76x0u stop working after a while running under heavy load #425

Closed ZerBea closed 4 years ago

ZerBea commented 4 years ago

mt76x0u driver stop working after a while, running under heavy load. Discovered on kernel 5.4.51 and kernel 5.7.9 and AMD, INTEL and Raspberry Pi Test devices: Bus 003 Device 003: ID 0b05:17d1 ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U] Bus 005 Device 006: ID 148f:761a Ralink Technology, Corp. MT7610U ("Archer T2U" 2.4G+5G WLAN Adapter

I have absolutely no idea what exactly happened due to missing warnings and error messages. But I can I can rule out a hardware error because the same issue happened on different system, different devices and different USB ports

Also I can rule out an xhci issue as reported, here: https://bugzilla.kernel.org/show_bug.cgi?id=202541 because the issue happened on a Raspberry Pi, too.

$ iw list
Wiphy phy0
    max # scan SSIDs: 4
    max scan IEs length: 2243 bytes
    max # sched scan SSIDs: 0
    max # match sets: 0
    max # scan plans: 1
    max scan plan interval: -1
    max scan plan iterations: 0
    Retry short limit: 7
    Retry long limit: 4
    Coverage class: 0 (up to 0m)
    Device supports RSN-IBSS.
    Supported Ciphers:
        * WEP40 (00-0f-ac:1)
        * WEP104 (00-0f-ac:5)
        * TKIP (00-0f-ac:2)
        * CCMP-128 (00-0f-ac:4)
        * CCMP-256 (00-0f-ac:10)
        * GCMP-128 (00-0f-ac:8)
        * GCMP-256 (00-0f-ac:9)
        * CMAC (00-0f-ac:6)
        * CMAC-256 (00-0f-ac:13)
        * GMAC-128 (00-0f-ac:11)
        * GMAC-256 (00-0f-ac:12)
    Available Antennas: TX 0x1 RX 0x1
    Configured Antennas: TX 0x1 RX 0x1
    Supported interface modes:
         * IBSS
         * managed
         * AP
         * AP/VLAN
         * monitor
         * mesh point
    Band 1:
        Capabilities: 0x17e
            HT20/HT40
            SM Power Save disabled
            RX Greenfield
            RX HT20 SGI
            RX HT40 SGI
            RX STBC 1-stream
            Max AMSDU length: 3839 bytes
            No DSSS/CCK HT40
        Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
        Minimum RX AMPDU time spacing: 4 usec (0x05)
        HT TX/RX MCS rate indexes supported: 0-7
        Bitrates (non-HT):
            * 1.0 Mbps (short preamble supported)
            * 2.0 Mbps (short preamble supported)
            * 5.5 Mbps (short preamble supported)
            * 11.0 Mbps (short preamble supported)
            * 6.0 Mbps
            * 9.0 Mbps
            * 12.0 Mbps
            * 18.0 Mbps
            * 24.0 Mbps
            * 36.0 Mbps
            * 48.0 Mbps
            * 54.0 Mbps
        Frequencies:
            * 2412 MHz [1] (16.0 dBm)
            * 2417 MHz [2] (16.0 dBm)
            * 2422 MHz [3] (16.0 dBm)
            * 2427 MHz [4] (16.0 dBm)
            * 2432 MHz [5] (16.0 dBm)
            * 2437 MHz [6] (16.0 dBm)
            * 2442 MHz [7] (16.0 dBm)
            * 2447 MHz [8] (16.0 dBm)
            * 2452 MHz [9] (16.0 dBm)
            * 2457 MHz [10] (16.0 dBm)
            * 2462 MHz [11] (16.0 dBm)
            * 2467 MHz [12] (16.0 dBm)
            * 2472 MHz [13] (16.0 dBm)
            * 2484 MHz [14] (disabled)
    Band 2:
        Capabilities: 0x17e
            HT20/HT40
            SM Power Save disabled
            RX Greenfield
            RX HT20 SGI
            RX HT40 SGI
            RX STBC 1-stream
            Max AMSDU length: 3839 bytes
            No DSSS/CCK HT40
        Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
        Minimum RX AMPDU time spacing: 4 usec (0x05)
        HT TX/RX MCS rate indexes supported: 0-7
        VHT Capabilities (0x31800120):
            Max MPDU length: 3895
            Supported Channel Width: neither 160 nor 80+80
            short GI (80 MHz)
            RX antenna pattern consistency
            TX antenna pattern consistency
        VHT RX MCS set:
            1 streams: MCS 0-7
            2 streams: not supported
            3 streams: not supported
            4 streams: not supported
            5 streams: not supported
            6 streams: not supported
            7 streams: not supported
            8 streams: not supported
        VHT RX highest supported: 0 Mbps
        VHT TX MCS set:
            1 streams: MCS 0-7
            2 streams: not supported
            3 streams: not supported
            4 streams: not supported
            5 streams: not supported
            6 streams: not supported
            7 streams: not supported
            8 streams: not supported
        VHT TX highest supported: 0 Mbps
        Bitrates (non-HT):
            * 6.0 Mbps
            * 9.0 Mbps
            * 12.0 Mbps
            * 18.0 Mbps
            * 24.0 Mbps
            * 36.0 Mbps
            * 48.0 Mbps
            * 54.0 Mbps
        Frequencies:
            * 5180 MHz [36] (20.0 dBm)
            * 5200 MHz [40] (20.0 dBm)
            * 5220 MHz [44] (20.0 dBm)
            * 5240 MHz [48] (20.0 dBm)
            * 5260 MHz [52] (20.0 dBm) (radar detection)
            * 5280 MHz [56] (20.0 dBm) (radar detection)
            * 5300 MHz [60] (20.0 dBm) (radar detection)
            * 5320 MHz [64] (20.0 dBm) (radar detection)
            * 5500 MHz [100] (20.0 dBm) (radar detection)
            * 5520 MHz [104] (20.0 dBm) (radar detection)
            * 5540 MHz [108] (20.0 dBm) (radar detection)
            * 5560 MHz [112] (20.0 dBm) (radar detection)
            * 5580 MHz [116] (20.0 dBm) (radar detection)
            * 5600 MHz [120] (20.0 dBm) (radar detection)
            * 5620 MHz [124] (20.0 dBm) (radar detection)
            * 5640 MHz [128] (20.0 dBm) (radar detection)
            * 5660 MHz [132] (20.0 dBm) (radar detection)
            * 5680 MHz [136] (20.0 dBm) (radar detection)
            * 5700 MHz [140] (20.0 dBm) (radar detection)
            * 5745 MHz [149] (13.0 dBm)
            * 5765 MHz [153] (13.0 dBm)
            * 5785 MHz [157] (13.0 dBm)
            * 5805 MHz [161] (13.0 dBm)
            * 5825 MHz [165] (13.0 dBm)
    Supported commands:
         * new_interface
         * set_interface
         * new_key
         * start_ap
         * new_station
         * new_mpath
         * set_mesh_config
         * set_bss
         * authenticate
         * associate
         * deauthenticate
         * disassociate
         * join_ibss
         * join_mesh
         * remain_on_channel
         * set_tx_bitrate_mask
         * frame
         * frame_wait_cancel
         * set_wiphy_netns
         * set_channel
         * set_wds_peer
         * probe_client
         * set_noack_map
         * register_beacons
         * start_p2p_device
         * set_mcast_rate
         * connect
         * disconnect
         * channel_switch
         * set_qos_map
         * set_multicast_to_unicast
    Supported TX frame types:
         * IBSS: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * managed: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * AP: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * AP/VLAN: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * mesh point: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * P2P-client: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * P2P-GO: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
         * P2P-device: 0x00 0x10 0x20 0x30 0x40 0x50 0x60 0x70 0x80 0x90 0xa0 0xb0 0xc0 0xd0 0xe0 0xf0
    Supported RX frame types:
         * IBSS: 0x40 0xb0 0xc0 0xd0
         * managed: 0x40 0xb0 0xd0
         * AP: 0x00 0x20 0x40 0xa0 0xb0 0xc0 0xd0
         * AP/VLAN: 0x00 0x20 0x40 0xa0 0xb0 0xc0 0xd0
         * mesh point: 0xb0 0xc0 0xd0
         * P2P-client: 0x40 0xd0
         * P2P-GO: 0x00 0x20 0x40 0xa0 0xb0 0xc0 0xd0
         * P2P-device: 0x40 0xd0
    software interface modes (can always be added):
         * AP/VLAN
         * monitor
    valid interface combinations:
         * #{ IBSS } <= 1, #{ managed, AP, mesh point } <= 2,
           total <= 2, #channels <= 1, STA/AP BI must match
    HT Capability overrides:
         * MCS: ff ff ff ff ff ff ff ff ff ff
         * maximum A-MSDU length
         * supported channel width
         * short GI for 40 MHz
         * max A-MPDU length exponent
         * min MPDU start spacing
    Device supports TX status socket option.
    Device supports HT-IBSS.
    Device supports SAE with AUTHENTICATE command
    Device supports low priority scan.
    Device supports scan flush.
    Device supports AP scan.
    Device supports per-vif TX power setting
    Driver supports full state transitions for AP/GO clients
    Driver supports a userspace MPM
    Device supports active monitor (which will ACK incoming frames)
    Device supports configuring vdev MAC-addr on create.
    Supported extended features:
        * [ VHT_IBSS ]: VHT-IBSS
        * [ RRM ]: RRM
        * [ FILS_STA ]: STA FILS (Fast Initial Link Setup)
        * [ CQM_RSSI_LIST ]: multiple CQM_RSSI_THOLD records
        * [ CONTROL_PORT_OVER_NL80211 ]: control port over nl80211
        * [ TXQS ]: FQ-CoDel-enabled intermediate TXQs
        * [ AIRTIME_FAIRNESS ]: airtime fairness scheduling

$ iw dev
phy#0
    Interface wlp5s0f4u2
        ifindex 3
        wdev 0x1
        addr 4e:fe:1c:e5:18:9e
        type managed
        txpower 16.00 dBm
        multicast TXQ:
            qsz-byt qsz-pkt flows   drops   marks   overlmt hashcol tx-bytes    tx-packets
            0   0   0   0   0   0   0   0       0

After running heavy load, NetworkSacn list is empty as well as iw scan list. $ sudo iw dev wlp5s0f4u2 scan $

dmesg doesn't show an error or warning. The device simply doesn't work:


[   13.204869] usb 3-2: New USB device found, idVendor=0b05, idProduct=17d1, bcdDevice= 1.00
[   13.204874] usb 3-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3
[   13.204877] usb 3-2: Product: WiFi
[   13.204880] usb 3-2: Manufacturer: MediaTek
[   13.204882] usb 3-2: SerialNumber: 1.0
[   13.258283] cfg80211: Loading compiled-in X.509 certificates for regulatory database
[   13.262585] cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7'
[   13.463984] usb 3-2: reset high-speed USB device number 3 using xhci_hcd
[   13.611314] mt76x0u 3-2:1.0: ASIC revision: 76100002 MAC revision: 76502000
[   14.869080] mt76x0u 3-2:1.0: EEPROM ver:02 fae:01
[   14.916563] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
[   14.917242] usbcore: registered new interface driver mt76x0u
[   14.932333] mt76x0u 3-2:1.0 wlp5s0f4u2: renamed from wlan0

Reloading module doesn't help. System must be rebooted.

After a reboot, everything is working as expected (for a while):

$ sudo iw dev wlp5s0f4u2 scan
BSS 08:96:d7:12:a1:3e(on wlp5s0f4u2)
    TSF: 37645473364 usec (0d, 10:27:25)
    freq: 2472
    beacon interval: 100 TUs
    capability: ESS Privacy ShortPreamble ShortSlotTime (0x0431)
    signal: -85.00 dBm
    last seen: 12940 ms ago
    Information elements from Probe Response frame:
    SSID: Test AP
    Supported rates: 1.0* 2.0* 5.5* 11.0* 6.0 9.0 12.0 18.0 
    DS Parameter set: channel 13
    Country: DE Environment: Indoor/Outdoor
        Channels [1 - 13] @ 20 dBm
    ERP: <no flags>
    Extended supported rates: 24.0 36.0 48.0 54.0 
    HT capabilities:
        Capabilities: 0x11ef
            RX LDPC
            HT20/HT40
            SM Power Save disabled
            RX HT20 SGI
            RX HT40 SGI
            TX STBC
            RX STBC 1-stream
            Max AMSDU length: 3839 bytes
            DSSS/CCK HT40
        Maximum RX AMPDU length 65535 bytes (exponent: 0x003)
        Minimum RX AMPDU time spacing: 8 usec (0x06)
        HT TX/RX MCS rate indexes supported: 0-23
    HT operation:
         * primary channel: 13
         * secondary channel offset: below
         * STA channel width: any
         * RIFS: 1
         * HT protection: 20 MHz
         * non-GF present: 1
         * OBSS non-GF present: 0
         * dual beacon: 0
         * dual CTS protection: 0
         * STBC beacon: 0
         * L-SIG TXOP Prot: 0
         * PCO active: 0
         * PCO phase: 0
    Extended capabilities:
         * Operating Mode Notification
    WMM:     * Parameter version 1
         * BE: CW 15-1023, AIFSN 3
         * BK: CW 15-1023, AIFSN 7
         * VI: CW 7-15, AIFSN 2, TXOP 3008 usec
         * VO: CW 3-7, AIFSN 2, TXOP 1504 usec
    RSN:     * Version: 1
         * Group cipher: CCMP
         * Pairwise ciphers: CCMP
         * Authentication suites: PSK
         * Capabilities: 1-PTKSA-RC 1-GTKSA-RC (0x0000)
    WPS:     * Version: 1.0
         * Wi-Fi Protected Setup State: 2 (Configured)
         * Response Type: 3 (AP)
         * UUID: 44ddaee5-e3bd-750d-42a8-0896d712a13e
         * Manufacturer: AVM
         * Model: FBox
         * Model Number: 0000
         * Serial Number: 0000
         * Primary Device Type: 6-0050f204-1
         * Device name: FBox
         * Config methods: Display, PBC, Keypad
         * RF Bands: 0x1
         * Unknown TLV (0x1049, 6 bytes): 00 37 2a 00 01 20
RealEnder commented 4 years ago

I can confirm the issue with 5.4

ZerBea commented 4 years ago

Some additional information: Running hcxdumptool on a Raspberry Pi with kernel 5.4.51 caused the same issue. After a while no packets arrived via RAW_SOCKET. Wireshark/tshark only showing outgoing packets from hcxdumptool (with 14 bytes hcxdumptool radiotap header). No error message or warning appears in dmesg log. iw list is working as well as iw dev, but iw scan showing an empty scan list.

ZerBea commented 4 years ago

A simple loop will cause the driver to stop: counter=1 while [ $counter -le 20 ] do sudo iw dev wlp5s0f4u2 scan ((counter++)) done

Running the script a few times we get the first error message: command failed: Device or resource busy (-16)

Running the script more times, the error messages increase. command failed: Device or resource busy (-16) command failed: Device or resource busy (-16)

until we get the message 20 times: command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16) command failed: Device or resource busy (-16)

Unfortunately dmesg log showing no error/warning.

LorenzoBianconi commented 4 years ago

:

I run the same test on 5.7.8-200 (last fedora kernel) using the same adapter (ASUSTek Computer, Inc. AC51 802.11a/b/g/n/ac Wireless Adapter [Mediatek MT7610U]) and it works fine for me. Moreover yesterday I run 1h iperf traffic with both 5.7.8-200 and mt76 wireless tree and it worked fine. Can you please check if you are able to trigger the issue even with mt76 wireless tree? Are you running the device in sta mode or are you doing something different? (e.g injecting traffic)

ZerBea commented 4 years ago

@LorenzoBianconi thanks for your fast reply. I'll check the mt76 wireless tree instead of 5.7.9-arch1-1 to make sure, it isn't Arch related. I opened the issue here and not on bugzilla.kernel.org because I have no idea what went wrong. Additional I can't trust the USB 3.0 ports due to the xhci issue. Running hcxdumptool (injecting traffic) on kernel 5.4 in monitor mode (Raspberry Pi - @RealEnder) caused that the driver stopped receiving packets very soon. Switching often between monitor mode and managed mode caused that the driver stopped, too. Is there an option to activate an enhanced debug mode on mt76 wireless tree or is it better to add some additional printk(), too?

ZerBea commented 4 years ago

@RealEnder , Alex, is it possible for you, to run hcxdumptool in parallel with tshark to monitor the traffic from/to the device until it stops receiving packets?

LorenzoBianconi commented 4 years ago

@LorenzoBianconi thanks for your fast reply. I'll check the mt76 wireless tree instead of 5.7.9-arch1-1 to make sure, it isn't Arch related. I opened the issue here and not on bugzilla.kernel.org because I have no idea what went wrong. Additional I can't trust the USB 3.0 ports due to the xhci issue. Running hcxdumptool (injecting traffic) on kernel 5.4 in monitor mode (Raspberry Pi - @RealEnder) caused that the driver stopped receiving packets very soon.

I can try to run hcxdumptool. Can you please provide me a reproducer? Moreover, is the dongle connected to an usb3.0 port or 2.0?

Switching often between monitor mode and managed mode caused that the driver stopped, too. Is there an option to activate an enhanced debug mode on mt76 wireless tree or is it better to add some additional printk(), too?

ZerBea commented 4 years ago

That will be great. The device is connected to USB 2.0 hcxdumptool test command: $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon Every minute you'll receive an additional status message like this: 11:28:00 1 ERROR:0 INCOMING:5041 OUTGOING:2125 PMKIDROGUE:2 PMKID:0 M1M2ROGUE:0 M1M2:0 M2M3:0 M3M4:0 M3M4ZEROED:0 GPS:0 You can monitor outgoing and incoming traffic running tshark in parallel $ tshark -i interface -w test2.pcapng

BTW: Running the same command on a mt7601u, everything is working as expected. $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon

Hunting for issues like this one is very tricky, because it can be caused by hcxdumptool (injecting traffic), the USB host (xhci), poor cable connection between device and hub, device overheating and more.... Unfortunately neither hcxdumptool (ERROR:0 ) nor dmesg showing an error. Even If I handle SIGPIPE in hcxdumptool, I got no error message.

ZerBea commented 4 years ago

Isn't mt76 wireless ready for kernel 5.7?

$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/lib/modules/5.7.9-arch1-1/build'
  CC [M]  /home/zerobeat/temp/mt76/agg-rx.o
/home/zerobeat/temp/mt76/agg-rx.c: In function 'mt76_rx_aggr_stop':
/home/zerobeat/temp/mt76/agg-rx.c:293:2: error: implicit declaration of function 'rcu_swap_protected' [-Werror=implicit-function-declaration]
  293 |  rcu_swap_protected(wcid->aggr[tidno], tid,
      |  ^~~~~~~~~~~~~~~~~~
/home/zerobeat/temp/mt76/agg-rx.c:294:7: error: implicit declaration of function 'lockdep_is_held'; did you mean 'lockdep_rtnl_is_held'? [-Werror=implicit-function-declaration]
  294 |       lockdep_is_held(&dev->mutex));
      |       ^~~~~~~~~~~~~~~
      |       lockdep_rtnl_is_held
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: /home/zerobeat/temp/mt76/agg-rx.o] Error 1
make: *** [Makefile:1732: /home/zerobeat/temp/mt76] Error 2
LorenzoBianconi commented 4 years ago

That will be great. The device is connected to USB 2.0 hcxdumptool test command: $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon Every minute you'll receive an additional status message like this: 11:28:00 1 ERROR:0 INCOMING:5041 OUTGOING:2125 PMKIDROGUE:2 PMKID:0 M1M2ROGUE:0 M1M2:0 M2M3:0 M3M4:0 M3M4ZEROED:0 GPS:0 You can monitor outgoing and incoming traffic running tshark in parallel $ tshark -i interface -w test2.pcapng

I run hcxdumptool for ~20min and it works as expected, I am able to sniff traffic with tcpdump. How often does the issue occur?

BTW: Running the same command on a mt7601u, everything is working as expected. $ hcxdumptool -i interface -o test.pcapng --enable_status=95 --active_beacon

Hunting for issues like this one is very tricky, because it can be caused by hcxdumptool (injecting traffic), the USB host (xhci), poor cable connection between device and hub, device overheating and more.... Unfortunately neither hcxdumptool (ERROR:0 ) nor dmesg showing an error. Even If I handle SIGPIPE in hcxdumptool, I got no error message.

LorenzoBianconi commented 4 years ago

Isn't mt76 wireless ready for kernel 5.7?

$ make -C /lib/modules/`uname -r`/build M=$PWD
make: Entering directory '/usr/lib/modules/5.7.9-arch1-1/build'
  CC [M]  /home/zerobeat/temp/mt76/agg-rx.o
/home/zerobeat/temp/mt76/agg-rx.c: In function 'mt76_rx_aggr_stop':
/home/zerobeat/temp/mt76/agg-rx.c:293:2: error: implicit declaration of function 'rcu_swap_protected' [-Werror=implicit-function-declaration]
  293 |  rcu_swap_protected(wcid->aggr[tidno], tid,
      |  ^~~~~~~~~~~~~~~~~~
/home/zerobeat/temp/mt76/agg-rx.c:294:7: error: implicit declaration of function 'lockdep_is_held'; did you mean 'lockdep_rtnl_is_held'? [-Werror=implicit-function-declaration]
  294 |       lockdep_is_held(&dev->mutex));
      |       ^~~~~~~~~~~~~~~
      |       lockdep_rtnl_is_held
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:267: /home/zerobeat/temp/mt76/agg-rx.o] Error 1
make: *** [Makefile:1732: /home/zerobeat/temp/mt76] Error 2

what I mean is the full wireless-driver-next tree

ZerBea commented 4 years ago

Ok, my fault. Now cloning the full wireless-driver-next. This issue ocurs mostly randomly. Sometimes it take up to one hour. How is the temperature of your AC51? Maybe it is a heat failure and the device stops.

ZerBea commented 4 years ago

Power consumption of the device is ok, too. Measured: 4.83 V and 0.12 A Should be ok for an USB hub.

Now running the wireless-driver-next. Looking still fine after 3 minutes.

ZerBea commented 4 years ago

Stopped hcxdumptool (everything is working as expected), going back to managed mode and started the scan script (several times): counter=1 while [ $counter -le 20 ] do sudo iw dev wlp39s0f3u1u1u2 scan ((counter++)) done

No " Device or resource busy (-16)" appeared! wireless-driver-next is running fine for me.

LorenzoBianconi commented 4 years ago

for me

sorry, I did not get what you mean here. is wireless-driver-next working? if not, how long does it take to stop?

ZerBea commented 4 years ago

on 5.4 the issue occurs often on 5.7 the issue occurs seldom wireless-driver-next, no error occured

This should make it less confusing: "form me" means that I can only speak for me. We have another participant here (@RealEnder) with a similar issue. Would be great, if he can confirm it, too. If so, we have a 100% solution.

LorenzoBianconi commented 4 years ago

on 5.4 the issue occurs often on 5.7 the issue occurs seldom wireless-driver-next, no error occured

This should make it less confusing: "form me" means that I can only speak for me. We have another participant here (@RealEnder) with a similar issue. Would be great, if he can confirm it, too. If so, we have a 100% solution.

ack, let's keep testing a little bit more and if you do not have any issue with wireless-drivers-next tree let's close it

ZerBea commented 4 years ago

I was hoping you would say that. It's a good idea to run more tests. I'll do some more tests on the Raspberry Pi. Here I noticed, too, that the RPI doesn't start sometimes, if a mt76x0u device is plugged in, before system power on. My reference device is an EDIMAX EW-7711UAN, ID 7392:7710 Edimax Technology Co., Ltd, mt7601u https://github.com/ZerBea/hcxdumptool/wiki/Penetration-testing-system-2 and an ALLNET ALLWA0150, ID 148f:7601 Ralink Technology, Corp. MT7601U Wireless Adapter, mt7601u https://github.com/ZerBea/hcxdumptool/wiki/Penetration-testing-system-1 Both of them running perfect.

BTW: Now 5Ghz injection on the mt76x0u device is working fine, too. It took me a while to find the CRDA (in combination with udev) issue.

ZerBea commented 4 years ago

I am not even sure that this issue is related to the driver only, because I noticed that on other devices (kernel <= 4.19), too: rt2800usb https://bugzilla.kernel.org/show_bug.cgi?id=202243#c19 or ath9k_htc https://github.com/ZerBea/hcxdumptool/issues/80

The only indication that it is possible a driver issue is, that the mt7601u is working fine under the same circumstances.

ZerBea commented 4 years ago

First test series finished on notebook and desktop running kernel 5.7 with wireless-drivers-next. No issues. That is really good But I noticed that the devices are going to be warm. @LorenzoBianconi do you know something about the thermal design of the chipset? Could it be possible that a thermal watchdog shut down the device without informing the kernel about it? Next step is to compile the mt76 driver for kernel 5.4 and start 2 tests using a Raspberry Pi Zero. I put one of the systems into a bag and I observe the temperature of the device. I know, the RPI will not shut down, due to the thermal design of the case - so let's see what happens with the WiFi adapter. If everything is working as expected, we I'll close the issue report.

ZerBea commented 4 years ago

@LorenzoBianconi I could need a "helping advice": Sometimes, after power on and plugged-in mt76x0 device, I got this message: $ journalctl | grep cfg80211 cfg80211: Process '/usr/bin/set-wireless-regdom' failed with exit code 1.

result: $ cat /sys/module/cfg80211/parameters/ieee80211_regdom 00

I think this caused that:

Could be a timing issue during boot, but I'm not sure. Do you have an idea?

I decided to add a task to Arch issue tracker: https://bugs.archlinux.org/task/67371

ZerBea commented 4 years ago

If nothing speaks against it, I would like to let this issue reportl open for a while. Unfortunately I encountered so many issues (xhci, crda, possible libnl) that have to be solved, before I'm able to do more tests on the driver.

LorenzoBianconi commented 4 years ago

@LorenzoBianconi I could need a "helping advice": Sometimes, after power on and plugged-in mt76x0 device, I got this message: $ journalctl | grep cfg80211 cfg80211: Process '/usr/bin/set-wireless-regdom' failed with exit code 1.

result: $ cat /sys/module/cfg80211/parameters/ieee80211_regdom 00

This issue does not seem to be related to mt76x0u.

I think this caused that:

* 5GHz injection isn't working

Injection does not work on World regdomain since active scanning is forbidden on 5GHz IIRC

* the device sometimes stops working as expected - if setchannel() arrived on a "not allowed channel".

Could be a timing issue during boot, but I'm not sure. Do you have an idea?

I decided to add a task to Arch issue tracker: https://bugs.archlinux.org/task/67371

ZerBea commented 4 years ago

@LorenzoBianconi thanks for the information. I'm sure that none of the issues are related to mt76 driver. Unfortunately I discovered them after reporting the mt76 issues. I'm sorry for that - but without your ideas I'll still hunt for them. It looks like the whole crda system is as weak as the xhci system. During the last past years many issues are reported.

I'll close this issue report, because the driver is working as expected. Thanks for your help.

Cheers Mike