openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
738 stars 341 forks source link

[7613] mt7613 wlan scan cause mcu timeout #511

Open ptpt52 opened 3 years ago

ptpt52 commented 3 years ago

chip is mt7613

[  660.520319] mt7615e 0000:02:00.0: Message 73 (seq 9) timeout
[  681.000319] mt7615e 0000:02:00.0: Message 73 (seq 10) timeout
[  701.480334] mt7615e 0000:02:00.0: Message 73 (seq 11) timeout

I try to ubus call iwinfo scan {"device": "wlan1"} after Multiple scans this issue happens

ptpt52 commented 3 years ago

here is how to reproduce it

while :; do iwinfo wlan1 scan; echo; uptime; sleep 1; done

any help? @nbd168

ptpt52 commented 3 years ago

it is likely that no issue on mt7615 chip I do the same test one mt7615, no issue

Borromini commented 3 years ago

@ptpt52 I am seeing this message as well, maybe it's unrelated, but I have two MT7613BE radios here (EAP235-Wall) and it eventually happens on both at some point. It's not a question if, it's a question when. Those timeout messages will fill the log for a while and eventually the radio just dies. I've only seen this when the radios idle - I have pushed hundreds of gigabytes over the radio and that works just fine it seems, it's just when it's idling that at some point it seems to go wrong. I had no way to trigger it but just wait (sometimes it's within a few hours, sometimes it's multiple days though).

I ran your iwinfo command, and this is what logread shows (OpenWrt 21.02 with latest mt76 driver from OpenWrt master):

Thu Mar 25 23:18:20 2021 authpriv.notice dropbear[6555]: Pubkey auth succeeded for 'root' with key sha1!! <snip> from 10.0.0.5:35098
Thu Mar 25 23:18:21 2021 kern.info kernel: [123931.918046] ieee80211 phy1: N9: (124009.228:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:21 2021 kern.info kernel: [123931.918046] �������;���]�;���]p�b@�r
Thu Mar 25 23:18:27 2021 kern.info kernel: [123937.941969] ieee80211 phy1: N9: (124015.255:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:27 2021 kern.info kernel: [123937.941969] �������;���]�;���]��AЛr
Thu Mar 25 23:18:33 2021 kern.info kernel: [123943.965938] ieee80211 phy1: N9: (124021.282:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:33 2021 kern.info kernel: [123943.965938] �������;���]�;���]��J`�r
Thu Mar 25 23:18:38 2021 kern.info kernel: [123948.667646] ieee80211 phy1: N9: (124025.986:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:38 2021 kern.info kernel: [123948.667646] �������;���]�;���]Q��u
Thu Mar 25 23:18:39 2021 kern.info kernel: [123949.989899] ieee80211 phy1: N9: (124027.310:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:39 2021 kern.info kernel: [123949.989899] �������;���]�;���]�:PP�w
Thu Mar 25 23:18:42 2021 kern.info kernel: [123952.559046] ieee80211 phy1: N9: (124029.880:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:42 2021 kern.info kernel: [123952.559046] �������;���]�;���]�X�
Thu Mar 25 23:18:43 2021 kern.info kernel: [123953.545245] ieee80211 phy1: N9: (124030.867:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:43 2021 kern.info kernel: [123953.545245] �������;���]�;���]�-\�j
Thu Mar 25 23:18:45 2021 kern.info kernel: [123956.016801] ieee80211 phy1: N9: (124033.340:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:45 2021 kern.info kernel: [123956.016801] �������;���]�;���]ІRp�
Thu Mar 25 23:18:47 2021 kern.info kernel: [123957.463012] ieee80211 phy1: N9: (124034.787:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:47 2021 kern.info kernel: [123957.463012] �������;���]�;���] �C �
Thu Mar 25 23:18:48 2021 kern.info kernel: [123958.449243] ieee80211 phy1: N9: (124035.774:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:48 2021 kern.info kernel: [123958.449243] �������;���]�;���]p�O@�
Thu Mar 25 23:18:51 2021 kern.info kernel: [123962.072360] ieee80211 phy1: N9: (124039.399:CMD-E)EXT_CMD_OPT_BIT_0_ACK
Thu Mar 25 23:18:51 2021 kern.info kernel: [123962.072360] �������;���]�;���]�ZYp�@
Thu Mar 25 23:18:52 2021 kern.info kernel: [123962.364951] ieee80211 phy1: N9: Wifi ASSERT @ ../wifi/custom/feature/pm_ap.c:454
Thu Mar 25 23:18:52 2021 kern.info kernel: [123962.364951] ��]�;���]�rWp{�
Thu Mar 25 23:19:13 2021 kern.err kernel: [123984.168757] mt7615e 0000:02:00.0: Message 73 (seq 15) timeout
Thu Mar 25 23:19:34 2021 kern.err kernel: [124004.648606] mt7615e 0000:02:00.0: Message 73 (seq 1) timeout
Thu Mar 25 23:19:54 2021 kern.err kernel: [124025.128456] mt7615e 0000:02:00.0: Message 38 (seq 2) timeout
Thu Mar 25 23:20:15 2021 kern.err kernel: [124045.608303] mt7615e 0000:02:00.0: Message 37 (seq 3) timeout
Thu Mar 25 23:20:35 2021 kern.err kernel: [124066.088147] mt7615e 0000:02:00.0: Message 73 (seq 4) timeout
Thu Mar 25 23:20:56 2021 kern.err kernel: [124086.567997] mt7615e 0000:02:00.0: Message 8 (seq 5) timeout
Thu Mar 25 23:21:16 2021 kern.err kernel: [124107.047845] mt7615e 0000:02:00.0: Message 8 (seq 6) timeout
Thu Mar 25 23:21:37 2021 kern.err kernel: [124127.527696] mt7615e 0000:02:00.0: Message 8 (seq 7) timeout
Thu Mar 25 23:21:57 2021 kern.err kernel: [124148.007541] mt7615e 0000:02:00.0: Message 8 (seq 8) timeout
Thu Mar 25 23:22:18 2021 kern.err kernel: [124168.487391] mt7615e 0000:02:00.0: Message 8 (seq 9) timeout
Thu Mar 25 23:22:38 2021 kern.err kernel: [124188.967236] mt7615e 0000:02:00.0: Message 8 (seq 10) timeout
Thu Mar 25 23:22:59 2021 kern.err kernel: [124209.447088] mt7615e 0000:02:00.0: Message 8 (seq 11) timeout
Thu Mar 25 23:23:19 2021 kern.err kernel: [124229.926931] mt7615e 0000:02:00.0: Message 8 (seq 12) timeout
Thu Mar 25 23:23:40 2021 kern.err kernel: [124250.406779] mt7615e 0000:02:00.0: Message 8 (seq 13) timeout
Thu Mar 25 23:24:00 2021 kern.err kernel: [124270.886634] mt7615e 0000:02:00.0: Message 8 (seq 14) timeout
Thu Mar 25 23:24:21 2021 kern.err kernel: [124291.366479] mt7615e 0000:02:00.0: Message 8 (seq 15) timeout
Thu Mar 25 23:24:41 2021 kern.err kernel: [124311.846333] mt7615e 0000:02:00.0: Message 8 (seq 1) timeout
Thu Mar 25 23:25:02 2021 kern.err kernel: [124332.326180] mt7615e 0000:02:00.0: Message 8 (seq 2) timeout

I have pinged nbd about this on IRC a while ago and was asked earlier on to switch on firmware debugging, like this:

echo 1 > /sys/kernel/debug/ieee80211/phy1/mt76/fw_debug

I can't say it's the same issue, but at least it's the same error message... I was asked to bisect also, and it looked like with the 2021-01-14 version, it worked fine, but with the 2021-01-27 one it broke. I have attempted multiple bisects but eventually I'd end up with a commit that git marked as bad, only to find out that reverting it didn't help either. So I'm at a loss, but hopefully the above can help.

I have two MT7615 devices here as well, those get multiple days of uptime and no issues like this. They actually run a WDS and form part of the backbone in the home network here.

ptpt52 commented 3 years ago

@Borromini thanks for you reply it is likely that it trigger the issue after you run the iwinfo scan command?

ryderlee1110 commented 3 years ago

can you try this patch https://patchwork.kernel.org/project/linux-wireless/patch/8157df1026bb9ea20fd39ce58b8abef9a1cedd37.1616657048.git.ryder.lee@mediatek.com/ to see if system can be recovered back?

Borromini commented 3 years ago

it is likely that it trigger the issue after you run the iwinfo scan command?

Running your command definitely triggered the timeout message, and the radio died. I ran your command at 23:18, more or less right after I SSH'ed into the device (start of the log excerpt). Log wasn't filtered, so everything you see here is the firmware debugging stuff + the kern.err messages. It looks like it only needed a single 'iwinfo scan' to break.

@ryderlee1110 Thanks I will give that a shot.

ryderlee1110 commented 3 years ago

@Borromini which firmware you're using? please apply https://patchwork.kernel.org/project/linux-wireless/patch/02d52818307a85b66e59ca11e36a9da47e2fe4df.1616657048.git.ryder.lee@mediatek.com/ as well? I fixed a mt7615 scan issue. but maybe it's a different one.

Borromini commented 3 years ago

@ryderlee1110 How can I tell? This is the OpenWrt firmware package: kmod-mt7663-firmware-ap - 5.4.107+2021-03-15-ca22cc22-4.

Boot log shows this, I don't see any firmware info on for the MT7615 radio:

Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.604167] Loading modules backported from Linux version v5.10.16-0-gde53befa79cf
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.619268] Backport generated by backports.git v5.10.16-1-0-g21d2a1d2
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.737818] xt_time: kernel timezone is -0000
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.811746] mt7621-pci 1e140000.pcie: bus=1 slot=0 irq=23
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.822572] pci 0000:00:00.0: enabling device (0006 -> 0007)
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.833882] mt7603e 0000:01:00.0: enabling device (0000 -> 0002)
Thu Mar 25 22:03:56 2021 kern.info kernel: [   29.846105] mt7603e 0000:01:00.0: ASIC revision: 76030010
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.476132] mt7603e 0000:01:00.0: Firmware Version: ap_pcie
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.487265] mt7603e 0000:01:00.0: Build Time: 20160107100755
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.522881] mt7603e 0000:01:00.0: firmware init done
Thu Mar 25 22:03:56 2021 kern.debug kernel: [   30.707828] ieee80211 phy0: Selected rate control algorithm 'minstrel_ht'
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.720534] mt7621-pci 1e140000.pcie: bus=2 slot=1 irq=24
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.731349] pci 0000:00:01.0: enabling device (0006 -> 0007)
Thu Mar 25 22:03:56 2021 kern.info kernel: [   30.742648] mt7615e 0000:02:00.0: enabling device (0000 -> 0002)
Thu Mar 25 22:03:56 2021 kern.debug kernel: [   30.774070] ieee80211 phy1: Selected rate control algorithm 'minstrel_ht'
ryderlee1110 commented 3 years ago

Firmware looks right. Anyway, we were not deeply involved in mt7613 on mt76. it should work but maybe not 100% perfect.

ptpt52 commented 3 years ago

@Borromini your msg shows the 7603 firmware version, not 7613 @ryderlee1110

here is my msg:

[   16.296292] mt7615e 0000:02:00.0: enabling device (0000 -> 0002)
[   17.753025] mt7615e 0000:02:00.0: mediatek/mt7663pr2h.bin not found, switching to mediatek/mt7663pr2h_rebb.bin
[   18.020618] mt7615e 0000:02:00.0: HW/SW Version: 0x65322d31, Build Time: 19091404338b809
[   18.531464] mt7615e 0000:02:00.0: N9 Firmware Version: 7663mp1827, Build Time: 20190914043434
[   18.540050] mt7615e 0000:02:00.0: Region number: 0x3
[   18.545093] mt7615e 0000:02:00.0: Parsing tailer Region: 0
[   18.554159] mt7615e 0000:02:00.0: Region 0, override_addr = 0x00112c00
[   18.560779] mt7615e 0000:02:00.0: Parsing tailer Region: 1
[   18.567300] mt7615e 0000:02:00.0: Parsing tailer Region: 2
[   18.573323] mt7615e 0000:02:00.0: override_addr = 0x00112c00, option = 3
ptpt52 commented 3 years ago

@ryderlee1110 test all your patch still issue and reset recovery not ok

ptpt52 commented 3 years ago

with @ryderlee1110 's patches @Borromini and I am trying to slow down the iwinfo scan frequency say, scan it in 60s interval

while :; do iwinfo wlan1 scan >/dev/null; echo $((i++)); uptime; sleep 60; done

and the mcu timeout issue not happens

I did scan in 1 second interval on chip mt7615, no issue happens

while :; do iwinfo wlan1 scan >/dev/null; echo $((i++)); uptime; sleep 1; done
ryderlee1110 commented 3 years ago

OK. Looks like a different issue on mt7613.

ptpt52 commented 3 years ago

@ryderlee1110 one thing your patch solve is that the iwinfo scan never kick sta client, and client could keep stream while no scan doing

ryderlee1110 commented 3 years ago

yes, that's one of issues I fixed for mt7615.

ptpt52 commented 3 years ago

@ryderlee1110 hello, may I ask you something? I can get chip temperature via

cat /sys/kernel/debug/ieee80211/phy1/mt76/temperature 
Temperature: 45

on chip mt7615

but cannot get it on chip mt7613, it always return 0:

cat /sys/kernel/debug/ieee80211/phy1/mt76/temperature 
Temperature: 0

what is the difference between them?

ryderlee1110 commented 3 years ago

@ptpt52 I don't know, it could be command id change etc. Although 7613 reuses 7615 code, I still think there might be some flow changes needs to be modified.

nbd168 commented 3 years ago

Today I fixed reset recovery on MT7622 in a way that might also help MT7613. Please test the latest version

ptpt52 commented 3 years ago

@nbd168 thanks, I would test it next Monday

ptpt52 commented 3 years ago

@nbd168 @ryderlee1110 very strange, with the latest code (https://github.com/openwrt/mt76/commit/465dda65ee8415169cc39596f913a50b5ba4f5a9), the mcu timeout never happens I am not sure which commit fix it.

ptpt52 commented 3 years ago

maybe the 0-terminating code ... commits fix it? or the reset recovery changes?

ptpt52 commented 3 years ago

@nbd168 @ryderlee1110 here to report back the mcu timeout issue still exists an cannot recovery

ptpt52 commented 3 years ago

it just become not esay to triger

ptpt52 commented 3 years ago

the mcu timeout issue happened accidentally, but it is hard for me to reproduce it, maybe it is some sort of hardware issue?

Borromini commented 3 years ago

Here it ranges from under an hour to multiple days for it to pop up. Happens on both MT7613 devices. I just got a third back (was on loan for porting OpenWrt) and I'll be testing that one as well.

ptpt52 commented 3 years ago

@Borromini should it be related to chip temperature?

ptpt52 commented 3 years ago

I try echo 1 >fw_debug and got this random code:

[27866.500207] ieee80211 phy1: N9: (27889.441:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27872.131939] ieee80211 phy1: N9: (27895.076:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27872.508168] ieee80211 phy1: N9: (27895.452:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27878.140275] ieee80211 phy1: N9: (27901.088:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27878.516065] ieee80211 phy1: N9: (27901.464:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27884.151745] ieee80211 phy1: N9: (27907.102:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27884.524416] ieee80211 phy1: N9: (27907.475:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27890.159706] ieee80211 phy1: N9: (27913.114:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27890.535863] ieee80211 phy1: N9: (27913.490:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27896.167820] ieee80211 phy1: N9: (27919.125:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27896.544152] ieee80211 phy1: N9: (27919.502:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27902.176016] ieee80211 phy1: N9: (27925.137:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27902.552250] ieee80211 phy1: N9: (27925.513:CMD-E)EXT_CMD_OPT_BIT_0_ACK
[27906.615795] ieee80211 phy1: N9: (27929.579:CMD-E)EXT_CMD_OPT_BIT_0_ACK
?z.N<..a?6.?2#?mY??...???1<????6&w.??.?.???E.?.l)?.???-*.
[28046.811695] h..??G?O\.$???.L.?.??T.. ?N?.T?????.?@_?.?L??._??E?.???.T?\???H<??|.?J?0!;.L^?N???]|S??LU]??SZ-5G..$?H?...
M???2.7.1M???2.7.1[28052.436228] 80211 ??1: N9: (28075.483:CMD-E)EXT CMD OPT BIT 0 ACK
[28052.851760] 80211 ??1: N9: (28075.899:CMD-E)EXT CMD OPT BIT 0 ACK
[28058.444764] 80211 ??1: N9: (28081.495:CMD-E)EXT CMD OPT BIT 0 ACK
[28058.860673] 80211 ??1: N9: (28081.911:CMD-E)EXT CMD OPT BIT 0 ACK
[28064.453809] 80211 ??1: N9: (28087.508:CMD-E)EXT CMD OPT BIT 0 ACK
[28064.872044] 80211 ??1: N9: (28087.926:CMD-E)EXT CMD OPT BIT 0 ACK
ptpt52 commented 3 years ago

I try to debug mcu and get this:

 448 static void
 449 mt7615_mcu_rx_ext_event(struct mt7615_dev *dev, struct sk_buff *skb)
 450 {
 451         struct mt7615_mcu_rxd *rxd = (struct mt7615_mcu_rxd *)skb->data;
 452 
 453         switch (rxd->ext_eid) {
 454         case MCU_EXT_EVENT_RDD_REPORT:
 455                 mt7615_mcu_rx_radar_detected(dev, skb);
 456                 break;
 457         case MCU_EXT_EVENT_CSA_NOTIFY:
 458                 mt7615_mcu_rx_csa_notify(dev, skb);
 459                 break;
 460         case MCU_EXT_EVENT_FW_LOG_2_HOST:
 461                 mt7615_mcu_rx_log_message(dev, skb);
 462                 break;
 463         default:
 464 
 465                 printk("get ext unhandle eid=%d ext_eid=%d seq=%d\n", rxd->eid, rxd->ext_eid, rxd->seq);
 466                 break;
 467         }
 468 }

when iwinfo scan

get ext unhandle eid=237 ext_eid=26 seq=0
porentak commented 3 years ago

If it helps, ext_eid=26 is EXT_EVENT_BEACON_LOSS.

ptpt52 commented 3 years ago

what is I disable beacon_offload on driver? @nbd168 @porentak

ptpt52 commented 3 years ago

I am going to call mt7615_mcu_beacon_loss_event() in ext_eid reach EXT_EVENT_BEACON_LOSS

Borromini commented 3 years ago

@Borromini should it be related to chip temperature?

That's one of the things I found when looking for possible mitigations, but at least for me lowering the transmit power didn't seem to make much of a difference. No idea how big a factor TX power is in the overall power consumption

ptpt52 commented 3 years ago

also I notice that MCU_EXT_EVENT_PS_SYNC ext_event comes up if iwinfo scan and it is correct to just ignore it?

ptpt52 commented 3 years ago

now I am doing handle the MCU_EXT_EVENT_BEACON_LOSS ext_event

@@ -443,7 +480,16 @@ mt7615_mcu_rx_ext_event(struct mt7615_dev *dev, struct sk_buff *skb)
        case MCU_EXT_EVENT_FW_LOG_2_HOST:
                mt7615_mcu_rx_log_message(dev, skb);
                break;
+       case MCU_EXT_EVENT_BEACON_LOSS:
+               //printk("get ext unhandle eid=%d ext_eid=%d seq=%d handle via mt7615_mcu_beacon_loss_event\n", rxd->eid, rxd->ext_eid, rxd->seq);
+               mt7615_mcu_beacon_loss_event(dev, skb);
+               break;
+       case MCU_EXT_EVENT_PS_SYNC:
+               //TODO
+               break;
        default:
+
+               printk("get ext unhandle eid=%d ext_eid=%d seq=%d MCU_EXT_EVENT_FW_LOG_2_HOST=%d\n", rxd->eid, rxd->ext_eid, rxd->seq, MCU_EXT_EVENT_FW_LOG_2_HOST);
                break;
        }
 }
ptpt52 commented 3 years ago

I run killall hostapd then it trigger the issue:

[ 2410.819856] get ext unhandle eid=237 ext_eid=35 seq=147
[ 2410.825158] get ext unhandle eid=237 ext_eid=35 seq=148
[ 2410.830423] get ext unhandle eid=237 ext_eid=35 seq=149
[ 2410.835712] get ext unhandle eid=237 ext_eid=35 seq=150
[ 2410.841018] get ext unhandle eid=237 ext_eid=35 seq=151
[ 2410.846280] get ext unhandle eid=237 ext_eid=35 seq=152
[ 2410.851572] get ext unhandle eid=237 ext_eid=35 seq=153
[ 2410.856889] get ext unhandle eid=237 ext_eid=35 seq=154
[ 2410.862154] get ext unhandle eid=237 ext_eid=35 seq=155
[ 2410.867441] get ext unhandle eid=237 ext_eid=35 seq=156
[ 2410.872748] get ext unhandle eid=237 ext_eid=35 seq=157
[ 2410.878015] get ext unhandle eid=237 ext_eid=35 seq=158
[ 2410.883332] get ext unhandle eid=237 ext_eid=35 seq=159
[ 2410.888629] get ext unhandle eid=237 ext_eid=35 seq=160
[ 2410.893890] get ext unhandle eid=237 ext_eid=35 seq=161
[ 2410.899179] get ext unhandle eid=237 ext_eid=35 seq=162
[ 2410.904455] get ext unhandle eid=237 ext_eid=35 seq=163
[ 2410.909761] get ext unhandle eid=237 ext_eid=35 seq=164
[ 2410.915046] get ext unhandle eid=237 ext_eid=35 seq=165
[ 2410.920319] get ext unhandle eid=237 ext_eid=35 seq=166
[ 2410.925592] get ext unhandle eid=237 ext_eid=35 seq=167

[ 2417.695678] mt7615e 0000:02:00.0: Message 39 (seq 9) timeout
[ 2438.175329] mt7615e 0000:02:00.0: Message 39 (seq 10) timeout
[ 2458.655350] mt7615e 0000:02:00.0: Message 39 (seq 11) timeout
[ 2479.135299] mt7615e 0000:02:00.0: Message 39 (seq 12) timeout
[ 2499.615297] mt7615e 0000:02:00.0: Message 73 (seq 13) timeout
ptpt52 commented 3 years ago

@Borromini use my changes in https://github.com/ptpt52/mt76/commit/36ed7ea7962ca2ed7cfa77de62fb123658bdae6d Test to see if the mcu timeout issue is fixed

ryderlee1110 commented 3 years ago

Just note that the mcu_beacon_loss_event format of mt7615 is different compares to connac one. struct mt7615_beacon_loss_event { u8 bssid[8]; u8 reason; u8 rsv; } __packed;

Also, I think we should check reason to update new beacon to firmware (if error happens ...). As for PS_SYN is useless in mt76 you can ignore it.

ptpt52 commented 3 years ago

@ryderlee1110 thanks for the note and I notice there may be some more difference something like get_temperature from mcu...

ryderlee1110 commented 3 years ago

that's why i don't want to drag every chip into mt7615 :) firmware stuff is always messy.

ptpt52 commented 3 years ago

@ryderlee1110 should I use the firmware file in bin/mt7663/rebb/mt7663_patch_e2_hdr.bin in property driver?

ryderlee1110 commented 3 years ago

I don't know actually ... if it's from property driver and works for you then I think it's fine. But you might need to handle few format changes. I think it's ok to add bcn_loss_event since mt7613 looks the same as mt7615 here.

ptpt52 commented 3 years ago

what is /lib/firmware/mediatek/mt7663_n9_rebb.bin and where to get it? @nbd168 @ryderlee1110

ptpt52 commented 3 years ago

each time mcu timeout happens

[  969.508658] get ext unhandle eid=237 ext_eid=35 seq=244
[  969.513952] get ext unhandle eid=237 ext_eid=35 seq=245
[  969.519235] get ext unhandle eid=237 ext_eid=35 seq=246
[  969.524559] get ext unhandle eid=237 ext_eid=35 seq=247

ext_eid=35 that means EXT_EVENT_ID_ASSERT_DUMP = 0x23 something bad happens in the chip?

ptpt52 commented 3 years ago

now I trigger it and get this dump:

[ 2271.741418] EXT_EVENT_ID_ASSERT_DUMP:[Wifi ASSERT:]
[ 2271.746445] EXT_EVENT_ID_ASSERT_DUMP:[../wifi/custom/feature/pm_ap.c]
[ 2271.752969] EXT_EVENT_ID_ASSERT_DUMP:[line: #454]
[ 2271.757736] EXT_EVENT_ID_ASSERT_DUMP:[
[ 2271.757736] Asserted file: ++3?]
[ 2271.764771] EXT_EVENT_ID_ASSERT_DUMP:[../wifi/custom/feature/pm_ap.c++]
[ 2271.771472] EXT_EVENT_ID_ASSERT_DUMP:[ (line #454)
[ 2271.771472] 
[ 2271.771472] ++]
[ 2271.779582] EXT_EVENT_ID_ASSERT_DUMP:[EMI Wakup Fail (i=101)++]
[ 2271.785656] EXT_EVENT_ID_ASSERT_DUMP:[<ASSERT> ../wifi/custom/feature/pm_ap.c #454 - , 0x0, 0x0, 0x0, rc=*]
[ 2271.795462] EXT_EVENT_ID_ASSERT_DUMP:[, id=0x2 WIFI, isr=0x0, irq=0x12]
[ 2271.802210] EXT_EVENT_ID_ASSERT_DUMP:[, lr=0x1F0C, sp=0x2009728,{isr_t=0}{asr_t=74733823}]
[ 2271.810523] EXT_EVENT_ID_ASSERT_DUMP:[{shed_trace*]
[ 2271.815526] EXT_EVENT_ID_ASSERT_DUMP:[[idx=2,sched_t=0x2,sched_s=74727314]]
[ 2271.822574] EXT_EVENT_ID_ASSERT_DUMP:[[idx=3,sched_t=0x2,sched_s=74727379]]
[ 2271.829634] EXT_EVENT_ID_ASSERT_DUMP:[[idx=4,sched_t=0x2,sched_s=74727444]]
[ 2271.836661] EXT_EVENT_ID_ASSERT_DUMP:[[idx=5,sched_t=0x2,sched_s=74727510]]
[ 2271.843722] EXT_EVENT_ID_ASSERT_DUMP:[[idx=6,sched_t=0x6,sched_s=74727574]]
[ 2271.850749] EXT_EVENT_ID_ASSERT_DUMP:[[idx=7,sched_t=0x2,sched_s=74728741]]
[ 2271.857785] EXT_EVENT_ID_ASSERT_DUMP:[[idx=8,sched_t=0x6,sched_s=74728806]]
[ 2271.864808] EXT_EVENT_ID_ASSERT_DUMP:[[idx=9,sched_t=0x2,sched_s=74730412]]
[ 2271.871852] EXT_EVENT_ID_ASSERT_DUMP:[[idx=10,sched_t=0x2,sched_s=74730427]]
[ 2271.878955] EXT_EVENT_ID_ASSERT_DUMP:[[idx=11,sched_t=0x6,sched_s=74730541]]
[ 2271.886084] EXT_EVENT_ID_ASSERT_DUMP:[[idx=12,sched_t=0x2,sched_s=74732163]0]
[ 2271.893285] EXT_EVENT_ID_ASSERT_DUMP:[[idx=13,sched_t=0x6,sched_s=74732228]]
[ 2271.900431] EXT_EVENT_ID_ASSERT_DUMP:[[idx=14,sched_t=0x2,sched_s=74733718]]
[ 2271.907534] EXT_EVENT_ID_ASSERT_DUMP:[[idx=0,sched_t=0x2,sched_s=74733783]]
[ 2271.914581] EXT_EVENT_ID_ASSERT_DUMP:[[idx=1,sched_t=0x2,sched_s=74733784]}??]
[ 2271.921908] EXT_EVENT_ID_ASSERT_DUMP:[{irq_trace*F]
[ 2271.926919] EXT_EVENT_ID_ASSERT_DUMP:[[idx=0,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.933628] EXT_EVENT_ID_ASSERT_DUMP:[[idx=1,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.940315] EXT_EVENT_ID_ASSERT_DUMP:[[idx=2,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.947054] EXT_EVENT_ID_ASSERT_DUMP:[[idx=3,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.953766] EXT_EVENT_ID_ASSERT_DUMP:[[idx=4,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.960468] EXT_EVENT_ID_ASSERT_DUMP:[[idx=5,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.967154] EXT_EVENT_ID_ASSERT_DUMP:[[idx=6,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.973884] EXT_EVENT_ID_ASSERT_DUMP:[[idx=7,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.980586] EXT_EVENT_ID_ASSERT_DUMP:[[idx=8,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.987302] EXT_EVENT_ID_ASSERT_DUMP:[[idx=9,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2271.993997] EXT_EVENT_ID_ASSERT_DUMP:[[idx=10,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2272.000782] EXT_EVENT_ID_ASSERT_DUMP:[[idx=11,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2272.007571] EXT_EVENT_ID_ASSERT_DUMP:[[idx=12,irq_t=0xFFFFFFFF,irq_s=0]]
[ 2272.014362] EXT_EVENT_ID_ASSERT_DUMP:[[idx=13,irq_t=0xFFFFFFFF,irq_s=0]..]
[ 2272.021403] EXT_EVENT_ID_ASSERT_DUMP:[[idx=14,irq_t=0xFFFFFFFF,irq_s=0]}?.]
[ 2272.028452] EXT_EVENT_ID_ASSERT_DUMP:[;;;;;;;;;;;;;;;;;;;;;;;;;;;;;++]
[ 2272.035039] EXT_EVENT_ID_ASSERT_DUMP:[; exception type: ASSERT, cnt:1++]
[ 2272.041820] EXT_EVENT_ID_ASSERT_DUMP:[{asser_type=5}]
[ 2272.046944] EXT_EVENT_ID_ASSERT_DUMP:[exp_main: jump from cos_assert++]
M[ 2272.053622] EXT_EVENT_ID_ASSERT_DUMP:[EXP_MAIN_ENTRY_CNT =1++.]
[ 2272.059596] EXT_EVENT_ID_ASSERT_DUMP:[;;[CONNSYS]core dump start ..
[ 2272.059596] ?]
[ 2272.067603] EXT_EVENT_ID_ASSERT_DUMP:[Patch: more log added here -->
[ 2272.067603] ++]
[ 2272.075759] EXT_EVENT_ID_ASSERT_DUMP:[exp_mem_dump_delay_count=1000, exp_mem_dump_delay_count=2000
[ 2272.075759] ++2]
[ 2272.086610] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_2)sched_t=2, sched_start=74727314++ ]
[ 2272.094680] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_3)sched_t=2, sched_start=74727379++]
i[ 2272.102684] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_4)sched_t=2, sched_start=74727444++]
[ 2272.110769] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_5)sched_t=2, sched_start=74727510++.]
n[ 2272.118870] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_6)sched_t=6, sched_start=74727574++]
[ 2272.126922] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_7)sched_t=2, sched_start=74728741++]
[ 2272.134969] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_8)sched_t=6, sched_start=74728806++.]
i[ 2272.143041] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_9)sched_t=2, sched_start=74730412++]
[ 2272.151140] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_10)sched_t=2, sched_start=74730427++]
c[ 2272.159216] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_11)sched_t=6, sched_start=74730541++]
[ 2272.167381] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_12)sched_t=2, sched_start=74732163++]
[ 2272.175463] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_13)sched_t=6, sched_start=74732228++]
o[ 2272.183585] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_14)sched_t=2, sched_start=74733718++]
m[ 2272.192524] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_0)sched_t=2, sched_start=74733783++]
[ 2272.200502] EXT_EVENT_ID_ASSERT_DUMP:[(sched_info_1)sched_t=2, sched_start=74733784++]
2[ 2272.208459] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_0)irq_t=FFFFFFFF, sched_start=0++P]
[ 2272.216191] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_1)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.223807] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_2)irq_t=FFFFFFFF, sched_start=0++]
.[ 2272.231462] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_3)irq_t=FFFFFFFF, sched_start=0++Q]
[ 2272.239272] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_4)irq_t=FFFFFFFF, sched_start=0++]
7[ 2272.246917] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_5)irq_t=FFFFFFFF, sched_start=0++0]
[ 2272.254698] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_6)irq_t=FFFFFFFF, sched_start=0++<]
[ 2272.262415] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_7)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.270057] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_8)irq_t=FFFFFFFF, sched_start=0++]
.[ 2272.277682] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_9)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.285340] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_10)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.293074] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_11)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.300791] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_12)irq_t=FFFFFFFF, sched_start=0++]
1[ 2272.308506] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_13)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.316278] EXT_EVENT_ID_ASSERT_DUMP:[(irq_info_14)irq_t=FFFFFFFF, sched_start=0++]
[ 2272.324004] EXT_EVENT_ID_ASSERT_DUMP:[Patch: more log added here <--++]
[ 2272.330720] EXT_EVENT_ID_ASSERT_DUMP:[[full dump] ignore++]
[ 2272.336387] EXT_EVENT_ID_ASSERT_DUMP:[; coredump start
[ 2272.336387] 
[ 2272.336387] ++]
[ 2272.344795] EXT_EVENT_ID_ASSERT_DUMP:[; registers
[ 2272.344795] 
[ 2272.344795] ++]
[ 2272.352828] EXT_EVENT_ID_ASSERT_DUMP:[R.S PC 0x1F18
[ 2272.352828] 
[ 2272.352828] ++]
[ 2272.360971] EXT_EVENT_ID_ASSERT_DUMP:[R.S PSW 0x8
[ 2272.360971] 
[ 2272.360971] ++]
[ 2272.369000] EXT_EVENT_ID_ASSERT_DUMP:[R.S IPC 0x119AE
[ 2272.369000] 
[ 2272.369000] ++N]
[ 2272.377449] EXT_EVENT_ID_ASSERT_DUMP:[R.S IPSW 0x9
[ 2272.377449] 
[ 2272.377449] ++]
[ 2272.385525] EXT_EVENT_ID_ASSERT_DUMP:[R.S R0 0x20096B8
[ 2272.385525] 
[ 2272.385525] ++]
[ 2272.393927] EXT_EVENT_ID_ASSERT_DUMP:[R.S R1 0x201CDB8
[ 2272.393927] 
[ 2272.393927] ++]
[ 2272.402361] EXT_EVENT_ID_ASSERT_DUMP:[R.S R2 0x1C6
[ 2272.402361] 
[ 2272.402361] ++]
[ 2272.410438] EXT_EVENT_ID_ASSERT_DUMP:[R.S R3 0x0
[ 2272.410438] 
[ 2272.410438] ++]
[ 2272.418357] EXT_EVENT_ID_ASSERT_DUMP:[R.S R4 0x0
[ 2272.418357] 
[ 2272.418357] ++`l]
[ 2272.426445] EXT_EVENT_ID_ASSERT_DUMP:[R.S R5 0x0
[ 2272.426445] 
[ 2272.426445] ++]
[ 2272.434385] EXT_EVENT_ID_ASSERT_DUMP:[R.S R6 0x0
[ 2272.434385] 
[ 2272.434385] ++`l]
[ 2272.442461] EXT_EVENT_ID_ASSERT_DUMP:[R.S R7 0x1F18
[ 2272.442461] 
[ 2272.442461] ++.3?]
[ 2272.450882] EXT_EVENT_ID_ASSERT_DUMP:[R.S R8 0x20043E4
[ 2272.450882] 
[ 2272.450882] ++]
[ 2272.459323] EXT_EVENT_ID_ASSERT_DUMP:[R.S R9 0x9
[ 2272.459323] 
[ 2272.459323] ++]
[ 2272.467248] EXT_EVENT_ID_ASSERT_DUMP:[R.S R10 0x1C6
[ 2272.467248] 
[ 2272.467248] ++naN]
[ 2272.475649] EXT_EVENT_ID_ASSERT_DUMP:[R.S R11 0x0
[ 2272.475649] 
[ 2272.475649] ++]
[ 2272.483642] EXT_EVENT_ID_ASSERT_DUMP:[R.S R12 0x0
[ 2272.483642] 
[ 2272.483642] ++?]
[ 2272.491700] EXT_EVENT_ID_ASSERT_DUMP:[R.S R13 0x0
[ 2272.491700] 
[ 2272.491700] ++.]
[ 2272.499765] EXT_EVENT_ID_ASSERT_DUMP:[R.S R14 0x0
[ 2272.499765] 
[ 2272.499765] ++l]
[ 2272.507851] EXT_EVENT_ID_ASSERT_DUMP:[R.S R15 0x0
[ 2272.507851] 
[ 2272.507851] ++]
[ 2272.515835] EXT_EVENT_ID_ASSERT_DUMP:[R.S R16 0x0
[ 2272.515835] 
[ 2272.515835] ++.]
[ 2272.523888] EXT_EVENT_ID_ASSERT_DUMP:[R.S R17 0x0
[ 2272.523888] 
[ 2272.523888] ++.]
[ 2272.531995] EXT_EVENT_ID_ASSERT_DUMP:[R.S R18 0x0
[ 2272.531995] 
[ 2272.531995] ++.]
[ 2272.540070] EXT_EVENT_ID_ASSERT_DUMP:[R.S R19 0x0
[ 2272.540070] 
[ 2272.540070] ++]
[ 2272.548078] EXT_EVENT_ID_ASSERT_DUMP:[R.S R20 0x0
.]
[ 2273.011710] EXT_EVENT_ID_ASSERT_DUMP:[[@.]
n[ 2273.015891] EXT_EVENT_ID_ASSERT_DUMP:[[t.]
[ 2273.020090] EXT_EVENT_ID_ASSERT_DUMP:[[?.]
[ 2273.024246] EXT_EVENT_ID_ASSERT_DUMP:[[?.]
i[ 2273.028406] EXT_EVENT_ID_ASSERT_DUMP:[[..]
c[ 2273.032642] EXT_EVENT_ID_ASSERT_DUMP:[[D.]
[ 2273.036879] EXT_EVENT_ID_ASSERT_DUMP:[[x.]
o[ 2273.041070] EXT_EVENT_ID_ASSERT_DUMP:[[?.]
.]
2[ 2273.280412] EXT_EVENT_ID_ASSERT_DUMP:[[@.]
[ 2273.284646] EXT_EVENT_ID_ASSERT_DUMP:[[t.]
.[ 2273.288789] EXT_EVENT_ID_ASSERT_DUMP:[[?.]
7[ 2273.293053] EXT_EVENT_ID_ASSERT_DUMP:[[?.]
[ 2273.297261] EXT_EVENT_ID_ASSERT_DUMP:[[..]
.[ 2273.301490] EXT_EVENT_ID_ASSERT_DUMP:[[D.]
ptpt52 commented 3 years ago

@ryderlee1110 @nbd168 @Borromini any suggestion?

ptpt52 commented 3 years ago

seems good news, it fixed via

diff --git a/mt7615/mcu.c b/mt7615/mcu.c
index 20890dc..31e3b99 100644
--- a/mt7615/mcu.c
+++ b/mt7615/mcu.c
@@ -790,6 +788,12 @@ mt7615_mcu_ctrl_pm_state(struct mt7615_dev *dev, int band, int state)
                .pm_state = state ? ENTER_PM_STATE : EXIT_PM_STATE,
                .band_idx = band,
        };
+
+       memset(&req, 0, sizeof(req));
+       req.pm_number = 5;
+       req.pm_state = state ? ENTER_PM_STATE : EXIT_PM_STATE;
+       req.band_idx = band;
+
        printk("mt7615_mcu_ctrl_pm_state state=%d\n", state);

        return mt76_mcu_send_msg(&dev->mt76, MCU_EXT_CMD_PM_STATE_CTRL, &req,
ptpt52 commented 3 years ago

oops I got another issue:

[ 8131.624960] mt7615e 0000:02:00.0: Message 73 (seq 7) timeout
[ 8152.104957] mt7615e 0000:02:00.0: Message 73 (seq 8) timeout
[ 8172.584919] mt7615e 0000:02:00.0: Message 73 (seq 9) timeout
[ 8193.064954] mt7615e 0000:02:00.0: Message 73 (seq 10) timeout
[ 8213.544952] mt7615e 0000:02:00.0: Message 73 (seq 11) timeout
[ 8234.024950] mt7615e 0000:02:00.0: Message 73 (seq 12) timeout
[ 8254.504963] mt7615e 0000:02:00.0: Message 73 (seq 13) timeout
[ 8274.984973] mt7615e 0000:02:00.0: Message 73 (seq 14) timeout
[ 8295.464945] mt7615e 0000:02:00.0: Message 73 (seq 15) timeout
[ 8315.945054] mt7615e 0000:02:00.0: Message 73 (seq 1) timeout
[ 8336.424973] mt7615e 0000:02:00.0: Message 73 (seq 2) timeout
[ 8356.904969] mt7615e 0000:02:00.0: Message 73 (seq 3) timeout
[ 8377.384969] mt7615e 0000:02:00.0: Message 73 (seq 4) timeout
[ 8397.864973] mt7615e 0000:02:00.0: Message 73 (seq 5) timeout
[ 8418.344987] mt7615e 0000:02:00.0: Message 73 (seq 6) timeout
[ 8438.824956] mt7615e 0000:02:00.0: Message 73 (seq 7) timeout
[ 8459.304989] mt7615e 0000:02:00.0: Message 73 (seq 8) timeout
[ 8479.784983] mt7615e 0000:02:00.0: Message 73 (seq 9) timeout
[ 8500.265021] mt7615e 0000:02:00.0: Message 73 (seq 10) timeout
[ 8520.744979] mt7615e 0000:02:00.0: Message 73 (seq 11) timeout
[ 8541.225027] mt7615e 0000:02:00.0: Message 37 (seq 12) timeout
[ 8541.230838] wlan1: HW problem - can not stop rx aggregation for e4:67:8e:ab:68:5b tid 0
[ 8561.705024] mt7615e 0000:02:00.0: Message 37 (seq 13) timeout
[ 8561.710821] wlan1: HW problem - can not stop rx aggregation for e4:67:8e:ab:68:5b tid 2
[ 8582.185007] mt7615e 0000:02:00.0: Message 37 (seq 14) timeout
[ 8582.190821] wlan1: HW problem - can not stop rx aggregation for e4:67:8e:ab:68:5b tid 3
[ 8602.664992] mt7615e 0000:02:00.0: Message 37 (seq 15) timeout
[ 8602.670788] ------------[ cut here ]------------
[ 8602.675664] WARNING: CPU: 1 PID: 1821 at backports-5.10.16-1/net/mac80211/agg-tx.c:406 ___ieee80211_stop_tx_ba_session+0x2b8/0x2d8 [mac80211]
[ 8602.689149] Modules linked in: xt_connlimit nf_conncount iptable_nat xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_CT nft_redir nft_ct nf_nat nf_t
[ 8602.689329]  ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnec
[ 8602.810029] CPU: 1 PID: 1821 Comm: hostapd Not tainted 5.4.105 #0
[ 8602.816120] Stack : 8042f578 80072e10 80660000 806623c0 806d0000 80662388 806614ec 8642d9fc
[ 8602.824488]         80810000 87d313b4 806aac43 805f58bc 00000001 00000001 8642d9a0 24a5ca47
[ 8602.832829]         00000000 00000000 80850000 00000000 30232035 0000016a 2e352064 30312e34
[ 8602.841169]         00000000 000001ab 00000000 000c5c2d 80000000 806d0000 00000000 86613784
[ 8602.849509]         00000009 86458c20 87ea9070 868fe480 00000000 80302620 00000004 80810004
[ 8602.857852]         ...
[ 8602.860304] Call Trace:
[ 8602.862793] [<8000b61c>] show_stack+0x30/0x100
[ 8602.867275] [<8053e6bc>] dump_stack+0xa4/0xdc
[ 8602.871657] [<8002bcd0>] __warn+0xc0/0x10c
[ 8602.875782] [<8002bd78>] warn_slowpath_fmt+0x5c/0xac
[ 8602.880947] [<86613784>] ___ieee80211_stop_tx_ba_session+0x2b8/0x2d8 [mac80211]
[ 8602.888337] [<86611c98>] ieee80211_sta_tear_down_BA_sessions+0x88/0x160 [mac80211]
[ 8602.895948] [<8660619c>] ieee80211_sta_set_buffered+0x920/0xe50 [mac80211]
[ 8602.903105] ---[ end trace b3aa443694ada836 ]---
[ 8623.144996] mt7615e 0000:02:00.0: Message 37 (seq 1) timeout
[ 8623.150757] ------------[ cut here ]------------
[ 8623.155958] WARNING: CPU: 0 PID: 1821 at backports-5.10.16-1/net/mac80211/agg-tx.c:406 ___ieee80211_stop_tx_ba_session+0x2b8/0x2d8 [mac80211]
[ 8623.168737] Modules linked in: xt_connlimit nf_conncount iptable_nat xt_state xt_nat xt_helper xt_conntrack xt_connmark xt_connbytes xt_REDIRECT xt_MASQUERADE xt_CT nft_redir nft_ct nf_nat nf_t
[ 8623.169087]  ip_set_hash_mac ip_set_hash_ipportnet ip_set_hash_ipportip ip_set_hash_ipport ip_set_hash_ipmark ip_set_hash_ip ip_set_bitmap_port ip_set_bitmap_ipmac ip_set_bitmap_ip ip_set nfnec
[ 8623.289847] CPU: 0 PID: 1821 Comm: hostapd Tainted: G        W         5.4.105 #0
[ 8623.297310] Stack : 8042f578 80072e10 80660000 806623c0 806d0000 80662388 806614ec 8642d9fc
[ 8623.305646]         80810000 87d313b4 806aac43 805f58bc 00000000 00000001 8642d9a0 24a5ca47
[ 8623.313970]         00000000 00000000 80850000 00000000 30232035 0000017f 2e352020 30312e34
[ 8623.322298]         00000000 000001ac 00000000 00046c37 80000000 806d0000 00000000 86613784
[ 8623.330624]         00000009 86458c20 87ea9070 868fe480 00000000 80302620 00000000 80810000
[ 8623.338950]         ...
[ 8623.341388] Call Trace:
[ 8623.343864] [<8000b61c>] show_stack+0x30/0x100
[ 8623.348332] [<8053e6bc>] dump_stack+0xa4/0xdc
[ 8623.352686] [<8002bcd0>] __warn+0xc0/0x10c
[ 8623.356765] [<8002bd78>] warn_slowpath_fmt+0x5c/0xac
[ 8623.361870] [<86613784>] ___ieee80211_stop_tx_ba_session+0x2b8/0x2d8 [mac80211]
[ 8623.369216] [<86611c98>] ieee80211_sta_tear_down_BA_sessions+0x88/0x160 [mac80211]
[ 8623.376806] [<8660619c>] ieee80211_sta_set_buffered+0x920/0xe50 [mac80211]
[ 8623.383948] ---[ end trace b3aa443694ada837 ]---
[ 8643.625010] mt7615e 0000:02:00.0: Message 37 (seq 2) timeout
[ 8664.105047] mt7615e 0000:02:00.0: Message 73 (seq 3) timeout
[ 8684.585069] mt7615e 0000:02:00.0: Message 73 (seq 4) timeout
[ 8705.065234] mt7615e 0000:02:00.0: Message 73 (seq 5) timeout
[ 8725.545030] mt7615e 0000:02:00.0: Message 73 (seq 6) timeout
[ 8746.025047] mt7615e 0000:02:00.0: Message 73 (seq 7) timeout
[ 8766.505002] mt7615e 0000:02:00.0: Message 73 (seq 8) timeout
[ 8786.985012] mt7615e 0000:02:00.0: Message 73 (seq 9) timeout
ptpt52 commented 3 years ago

hello, @LorenzoBianconi what do you think of this issue? the mcu timeout happens after random hours of time. the code is the latest. kernel 5.4 in openwrt master

ptpt52 commented 3 years ago

@ryderlee1110 so I know something about the mcu timeout if mt7615_mcu_set_fw_ctrl call, the muc timeout issue happens?

ptpt52 commented 3 years ago

I just hack disable any pm ctrl for mcu, and hope this would help with the mcu timeout issue

diff --git a/mt7615/mcu.c b/mt7615/mcu.c
index fcbeb9b8..307d6441 100644
--- a/mt7615/mcu.c
+++ b/mt7615/mcu.c
@@ -821,7 +821,8 @@ mt7615_mcu_ctrl_pm_state(struct mt7615_dev *dev, int band, int state)
        req.pm_state = state ? ENTER_PM_STATE : EXIT_PM_STATE;
        req.band_idx = band;

-       printk("%s:%d send MCU_EXT_CMD_PM_STATE_CTRL\n", __func__, __LINE__);
+       printk("%s:%d send MCU_EXT_CMD_PM_STATE_CTRL ignore\n", __func__, __LINE__);
+       return 0;
        return mt76_mcu_send_msg(&dev->mt76, MCU_EXT_CMD_PM_STATE_CTRL, &req,
                                 sizeof(req), true);
 }