openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
746 stars 342 forks source link

mt798x: Crash with `multicast_to_unicast_all` set #866

Closed Fail-Safe closed 1 month ago

Fail-Safe commented 7 months ago

Several OpenWrt users have reported an issue with having multicast_to_unicast_all set on mt798x hardware. The issue presents as:

Wed Nov 29 16:32:23 2023 kern.err kernel: [ 2802.378439] mt798x-wmac 18000000.wifi: Message 00005aed (seq 5) timeout
Wed Nov 29 16:32:44 2023 kern.err kernel: [ 2822.853228] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout
Wed Nov 29 16:32:44 2023 kern.err kernel: [ 2822.859863] mt798x-wmac 18000000.wifi: Message 000026ed (seq 7) timeout
Wed Nov 29 16:32:55 2023 kern.err kernel: [ 2834.311039] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:33:14 2023 kern.err kernel: [ 2853.095811] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:33:32 2023 kern.err kernel: [ 2871.141260] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
Wed Nov 29 16:33:44 2023 kern.err kernel: [ 2882.977593] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
Wed Nov 29 16:34:20 2023 kern.err kernel: [ 2918.927495] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
Wed Nov 29 16:34:21 2023 kern.err kernel: [ 2920.357468] mt798x-wmac 18000000.wifi: Message 00005aed (seq 3) timeout
Wed Nov 29 16:35:27 2023 kern.err kernel: [ 2986.499628] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
Wed Nov 29 16:36:12 2023 kern.err kernel: [ 3030.927934] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
Wed Nov 29 16:36:27 2023 kern.err kernel: [ 3045.723855] mt798x-wmac 18000000.wifi: Message 00005aed (seq 9) timeout
...

I was able to narrow down the issue to the multicast_to_unicast_all setting here: https://forum.openwrt.org/t/mt798x-wmac-18000000-wifi-message-xxxxxxxx-seq-5-timeout/175163/6?u=_failsafe

Several users have confirmed unsetting (disabling) this option avoids the crash for them as well.

This happens to me on snapshot build r25580-85ad6b9569. Hoping others can chime in here with any other particulars that might help narrow this down further.

xize commented 6 months ago

@Fail-Safe I was experiencing something similiar with multi psk, it seems the allmulticast mode was also active.

Then I readed this post by nxhack: https://forum.openwrt.org/t/gl-inet-flint-2-gl-mt6000-discussions/173524/1086?u=xize

Now I have compiled my own version for the MT6000 on kernel 6.6 with this version of the calibration data, and it seems my setup is no longer crashing when I use my Ayaneo Geek 1S (Intel AX210), I also added my crash report in the issue you mentoided #860, it was for me mostly affected by the AX210 device for me once I used heavy p2p udp (gta online traffic), now I play this game +1 hour and the driver does not seem to crash.

Can you check if this fixes your issue aswell?, this is the relevant commit of my mt76 fork to show which files I have replaced (I noticed if I replace the other files like the eeproms I was getting a softbrick when I did it via /usr/firmware/mediatek at runtime), I'm very interested to see if it fixes also this issue πŸ‘

Fail-Safe commented 6 months ago

@xize Thank you so much for letting me know about this! I actually had grabbed those newer firmware files (for mt7986) as well and have been running with them on my kernel 6.6 build. For whatever reason, I hadn't thought to try re-enabling multicast_to_unicast_all with the newer firmware, though.

Trying it now... will report back once I get it some time to run. :)

xize commented 6 months ago

unfortunately I spook to soon mine still crashed another stacktrace:

[   26.388775] br-lan: port 10(phy0-ap0-aqnet) entered forwarding state
[   26.398053] mt798x-wmac 18000000.wifi phy0-ap0-aqnet: left allmulticast mode
[   26.405179] mt798x-wmac 18000000.wifi phy0-ap0-aqnet: left promiscuous mode
[   26.412299] br-lan: port 10(phy0-ap0-aqnet) entered disabled state
[   26.451733] br-lan: port 10(phy0-ap0-aqnet) entered blocking state
[   26.457914] br-lan: port 10(phy0-ap0-aqnet) entered disabled state
[   26.464131] mt798x-wmac 18000000.wifi phy0-ap0-aqnet: entered allmulticast mode
[   26.471792] mt798x-wmac 18000000.wifi phy0-ap0-aqnet: entered promiscuous mode
[   26.480151] br-lan: port 10(phy0-ap0-aqnet) entered blocking state
[   26.486374] br-lan: port 10(phy0-ap0-aqnet) entered forwarding state
[   26.686587] br-lan: port 6(phy0-ap0) entered blocking state
[   26.692187] br-lan: port 6(phy0-ap0) entered forwarding state
[   26.698430] br-lan: port 8(phy0-ap0-zigbee) entered blocking state
[   26.704659] br-lan: port 8(phy0-ap0-zigbee) entered forwarding state
[   31.324517] br-lan: port 11(vx0) entered blocking state
[   31.329752] br-lan: port 11(vx0) entered disabled state
[   31.335099] vx0: entered allmulticast mode
[   31.339398] vx0: entered promiscuous mode
[   31.345514] br-lan: port 11(vx0) entered blocking state
[   31.350772] br-lan: port 11(vx0) entered forwarding state
[   60.608218] br-lan.169: entered allmulticast mode
[   60.613088] br-lan: entered allmulticast mode
[   60.617678] eth1.300: entered allmulticast mode
[   60.622288] mtk_soc_eth 15100000.ethernet eth1: entered allmulticast mode
[  105.683150] br-lan: port 12(phy1-ap0-aya) entered blocking state
[  105.689200] br-lan: port 12(phy1-ap0-aya) entered disabled state
[  105.695243] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered allmulticast mode
[  105.702578] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered promiscuous mode
[  105.712447] mt798x-wmac 18000000.wifi phy1-ap0-aya: left allmulticast mode
[  105.719365] mt798x-wmac 18000000.wifi phy1-ap0-aya: left promiscuous mode
[  105.726189] br-lan: port 12(phy1-ap0-aya) entered disabled state
[  105.780359] br-lan: port 12(phy1-ap0-aya) entered blocking state
[  105.786363] br-lan: port 12(phy1-ap0-aya) entered disabled state
[  105.792409] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered allmulticast mode
[  105.799693] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered promiscuous mode
[  105.995558] br-lan: port 7(phy1-ap0) entered blocking state
[  106.001163] br-lan: port 7(phy1-ap0) entered forwarding state
[  106.007126] br-lan: port 12(phy1-ap0-aya) entered blocking state
[  106.013139] br-lan: port 12(phy1-ap0-aya) entered forwarding state
[12450.669959] mt798x-wmac 18000000.wifi: Message 000026ed (seq 9) timeout
[12471.127256] mt798x-wmac 18000000.wifi: Message 00005aed (seq 10) timeout
[12491.585889] mt798x-wmac 18000000.wifi: Message 000026ed (seq 11) timeout
[12491.592849] mt798x-wmac 18000000.wifi: Message 000025ed (seq 12) timeout
[12491.599607] ------------[ cut here ]------------
[12491.604206] WARNING: CPU: 0 PID: 18242 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12491.613021] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact
[12491.613176]  ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia
[12491.728150] CPU: 0 PID: 18242 Comm: kworker/u8:2 Tainted: G           O       6.6.27 #0
[12491.736131] Hardware name: GL.iNet GL-MT6000 (DT)
[12491.740818] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[12491.746829] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[12491.753770] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12491.760385] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211]
[12491.767001] sp : ffffffc08a823c80
[12491.770298] x29: ffffffc08a823c80 x28: 0000000000000001 x27: ffffff800d77e6c0
[12491.777415] x26: ffffff800a3c23b8 x25: ffffff80076008a0 x24: ffffff80076008a0
[12491.784530] x23: ffffffc07906bd10 x22: ffffff800a3c40e8 x21: 0000000000000001
[12491.791646] x20: ffffff800d77e6c0 x19: ffffff800a3c2000 x18: 000000000000017c
[12491.798761] x17: 0000000000000000 x16: 0000000000000078 x15: ffffffc080b5a128
[12491.805876] x14: 0000000000000474 x13: 000000000000017c x12: 00000000ffffffea
[12491.812992] x11: 0000000000000040 x10: ffffffc080b57470 x9 : ffffffc080b57468
[12491.820107] x8 : ffffff8000403dc0 x7 : 0000000000000000 x6 : 0000001aa2970ad3
[12491.827222] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000
[12491.834337] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000ffffff92
[12491.841453] Call trace:
[12491.843885]  ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12491.850155]  ieee80211_ba_session_work+0x418/0x444 [mac80211]
[12491.855904]  process_one_work+0x154/0x2a0
[12491.859902]  worker_thread+0x2a8/0x484
[12491.863636]  kthread+0xdc/0xe8
[12491.866679]  ret_from_fork+0x10/0x20
[12491.870241] ---[ end trace 0000000000000000 ]---
[12896.914708] mt798x-wmac 18000000.wifi: Message 000026ed (seq 4) timeout
[12917.372441] mt798x-wmac 18000000.wifi: Message 00005aed (seq 5) timeout
[12937.831918] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout
[12937.838606] ------------[ cut here ]------------
[12937.843214] WARNING: CPU: 3 PID: 17543 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12937.852030] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact
[12937.852192]  ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia
[12937.967165] CPU: 3 PID: 17543 Comm: kworker/u8:4 Tainted: G        W  O       6.6.27 #0
[12937.975147] Hardware name: GL.iNet GL-MT6000 (DT)
[12937.979834] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[12937.985852] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[12937.992793] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12937.999409] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211]
[12938.006024] sp : ffffffc08a7abc80
[12938.009323] x29: ffffffc08a7abc80 x28: 0000000000000001 x27: ffffff800d77e240
[12938.016439] x26: ffffff800aec83b8 x25: ffffff80076008a0 x24: ffffff80076008a0
[12938.023554] x23: ffffffc07906bd10 x22: ffffff800aece0e8 x21: 0000000000000001
[12938.030670] x20: ffffff800d77e240 x19: ffffff800aec8000 x18: ffffff800aece000
[12938.037786] x17: 0000000000000001 x16: 00000000000021c0 x15: ffffff80076008a6
[12938.044902] x14: 0000000000000028 x13: fffffffffffff778 x12: 0000000000000002
[12938.052017] x11: 0000000000000040 x10: ffffffc080b57470 x9 : ffffffc080b57468
[12938.059134] x8 : 0000000000000002 x7 : 000000000000b737 x6 : 0000001aa2970ad3
[12938.066249] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000
[12938.073364] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000fffffff4
[12938.080481] Call trace:
[12938.082913]  ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[12938.089185]  ieee80211_ba_session_work+0x418/0x444 [mac80211]
[12938.094934]  process_one_work+0x154/0x2a0
[12938.098933]  worker_thread+0x2a8/0x484
[12938.102667]  kthread+0xdc/0xe8
[12938.105708]  ret_from_fork+0x10/0x20
[12938.109270] ---[ end trace 0000000000000000 ]---
Fail-Safe commented 6 months ago

I just started noticing the 000026ed and 00005aed timeouts again myself:

[  743.168242] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
[ 1297.661339] mt798x-wmac 18000000.wifi: Message 00005aed (seq 13) timeout
[ 1842.429689] mt798x-wmac 18000000.wifi: Message 000026ed (seq 15) timeout
[ 1862.888429] mt798x-wmac 18000000.wifi: Message 00005aed (seq 1) timeout
[ 1883.346054] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 2) timeout
[ 1903.804285] mt798x-wmac 18000000.wifi: Message 000026ed (seq 3) timeout
[ 1924.263044] mt798x-wmac 18000000.wifi: Message 00005aed (seq 4) timeout
[ 1944.720388] mt798x-wmac 18000000.wifi: Message 000026ed (seq 5) timeout
[ 1965.179029] mt798x-wmac 18000000.wifi: Message 00005aed (seq 6) timeout
[ 1985.637032] mt798x-wmac 18000000.wifi: Message 000026ed (seq 7) timeout
[ 2006.094603] mt798x-wmac 18000000.wifi: Message 00005aed (seq 8) timeout
[ 2026.553350] mt798x-wmac 18000000.wifi: Message 000026ed (seq 9) timeout
xize commented 5 months ago

I can confirm aswell with 100% certainty it is multicast, yesterday I added multicast_to_unicast='0' to the br-lan bridge and that fixed the strange crash my Ayaneo was creating connected to my multi psk setup.

Now i'm also monitoring a other device: the Mi Smart Clock, it seems this device does not cause a crash or a time out inside the OpenWrt logs, but it seem to get a artifacting/unresponsive touch screen after a while being up, if it keeps responsive with this change my guesses point to maybe these things:

any suggestions which commands I can try to check if they are indeed invalidated/corrupt multicast packets or flooding? that would be surely helpfull πŸ‘

Fail-Safe commented 5 months ago

@nbd168 Is this multicast_to_unicast_all issue correctly homed in this mt76 project? Or is there another project where I should create this issue to get proper visibility?

Thanks!

xize commented 5 months ago

so I have tried applying this patch from @blocktrron to see if it makes any changes from here.

interesting my stacktrace shows a little bit more:

``` [ 184.556455] Enabled tx worker queued=0 ndesc=2048 [ 310.658843] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout [ 331.118170] mt798x-wmac 18000000.wifi: Message 00005aed (seq 7) timeout [ 331.124902] Scheduling all pending txq queued=458 ndesc=2048 [ 331.130558] Scheduling all pending txq queued=458 ndesc=2048 [ 331.136211] Disabled tx worker queued=458 ndesc=2048 [ 331.141201] Cleaned tx Queues queued=0 ndesc=2048 [ 351.573291] mt798x-wmac 18000000.wifi: Message 000026ed (seq 8) timeout [ 351.579922] mt798x-wmac 18000000.wifi: Message 000025ed (seq 9) timeout [ 351.579926] ------------[ cut here ]------------ [ 351.579928] WARNING: CPU: 0 PID: 3369 at kthread_park+0x9c/0xb0 [ 351.586529] Sent BA update queued=0 ndesc=2048 [ 351.591117] Modules linked in: [ 351.597027] Enabled tx worker queued=0 ndesc=2048 [ 351.601436] pppoe [ 351.604528] ------------[ cut here ]------------ [ 351.609155] ppp_async nft_fib_inet nf_flow_table_inet [ 351.611156] WARNING: CPU: 2 PID: 244 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 351.615751] wireguard pppox [ 351.620868] Modules linked in: [ 351.629450] ppp_generic [ 351.632314] pppoe [ 351.635349] nft_reject_ipv6 [ 351.637866] ppp_async [ 351.639862] nft_reject_ipv4 [ 351.642725] nft_fib_inet [ 351.645068] nft_reject_inet [ 351.647930] nf_flow_table_inet [ 351.650533] nft_reject [ 351.653396] wireguard [ 351.656519] nft_redir [ 351.658949] pppox [ 351.661292] nft_quota [ 351.663635] ppp_generic [ 351.665631] nft_numgen [ 351.667974] nft_reject_ipv6 [ 351.670490] nft_nat [ 351.672920] nft_reject_ipv4 [ 351.675783] nft_masq [ 351.677952] nft_reject_inet [ 351.680815] nft_log [ 351.683071] nft_reject [ 351.685934] nft_limit [ 351.688104] nft_redir [ 351.690534] nft_hash [ 351.692877] nft_quota [ 351.695220] nft_flow_offload [ 351.697476] nft_numgen [ 351.699818] nft_fib_ipv6 [ 351.702768] nft_nat [ 351.705198] nft_fib_ipv4 [ 351.707801] nft_masq [ 351.709971] nft_fib [ 351.712574] nft_log [ 351.714830] nft_ct [ 351.716999] nft_limit [ 351.719168] nft_compat [ 351.721251] nft_hash [ 351.723594] nft_chain_nat [ 351.726024] nft_flow_offload [ 351.728281] nf_tables [ 351.730970] nft_fib_ipv6 [ 351.733920] nf_nat [ 351.736262] nft_fib_ipv4 [ 351.738865] nf_flow_table [ 351.740948] nft_fib [ 351.743551] nf_conntrack_netlink [ 351.746240] nft_ct [ 351.748409] nf_conntrack [ 351.751706] nft_compat [ 351.753790] mt7915e(O) [ 351.756392] nft_chain_nat [ 351.758821] mt76_connac_lib(O) [ 351.761250] nf_tables [ 351.763940] mt76(O) [ 351.767062] nf_nat [ 351.769405] mac80211(O) [ 351.771574] nf_flow_table [ 351.773658] libchacha20poly1305 [ 351.776174] nf_conntrack_netlink [ 351.778864] iptable_mangle [ 351.782073] nf_conntrack [ 351.785369] iptable_filter [ 351.788146] mt7915e(O) [ 351.790749] ipt_REJECT [ 351.793525] mt76_connac_lib(O) [ 351.795955] ipt_ECN [ 351.798384] mt76(O) [ 351.801506] ip_tables [ 351.803676] mac80211(O) [ 351.805846] chacha_neon [ 351.808189] libchacha20poly1305 [ 351.810706] cfg80211(O) [ 351.813222] iptable_mangle [ 351.816431] xt_time [ 351.818947] iptable_filter [ 351.821723] xt_tcpudp [ 351.823892] ipt_REJECT [ 351.826669] xt_tcpmss [ 351.829011] ipt_ECN [ 351.831441] xt_statistic [ 351.833784] ip_tables [ 351.835954] xt_multiport [ 351.838556] chacha_neon [ 351.840899] xt_mark [ 351.843502] cfg80211(O) [ 351.846019] xt_mac [ 351.848187] xt_time [ 351.850704] xt_limit [ 351.852787] xt_tcpudp [ 351.854957] xt_length [ 351.857213] xt_tcpmss [ 351.859557] xt_hl [ 351.861899] xt_statistic [ 351.864242] xt_ecn [ 351.866238] xt_multiport [ 351.868841] xt_dscp [ 351.870924] xt_mark [ 351.873527] xt_comment [ 351.875696] xt_mac [ 351.877866] xt_TCPMSS [ 351.880295] xt_limit [ 351.882379] xt_LOG [ 351.884721] xt_length [ 351.886977] xt_HL [ 351.889060] xt_hl [ 351.891403] xt_DSCP [ 351.893398] xt_ecn [ 351.895395] xt_CLASSIFY [ 351.897564] xt_dscp [ 351.899648] x_tables [ 351.902163] xt_comment [ 351.904333] slhc [ 351.906588] xt_TCPMSS [ 351.909019] sch_cake [ 351.910928] xt_LOG [ 351.913271] poly1305_neon [ 351.915527] xt_HL [ 351.917610] nfnetlink [ 351.920300] xt_DSCP [ 351.922295] nf_reject_ipv6 [ 351.924638] xt_CLASSIFY [ 351.926808] nf_reject_ipv4 [ 351.929584] x_tables [ 351.932101] nf_log_syslog [ 351.934877] slhc [ 351.937133] nf_defrag_ipv6 [ 351.939822] sch_cake [ 351.941732] nf_defrag_ipv4 [ 351.944508] poly1305_neon [ 351.946764] libcurve25519_generic [ 351.949540] nfnetlink [ 351.952230] libcrc32c [ 351.955612] nf_reject_ipv6 [ 351.957955] libchacha [ 351.960297] nf_reject_ipv4 [ 351.963074] compat(O) [ 351.965416] nf_log_syslog [ 351.968193] crypto_safexcel [ 351.970536] nf_defrag_ipv6 [ 351.973225] sch_tbf [ 351.976087] nf_defrag_ipv4 [ 351.978863] sch_ingress [ 351.981032] libcurve25519_generic [ 351.983808] sch_htb [ 351.986325] libcrc32c [ 351.989708] sch_hfsc [ 351.991877] libchacha [ 351.994221] em_u32 [ 351.996477] compat(O) [ 351.998819] cls_u32 [ 352.000902] crypto_safexcel [ 352.003245] cls_route [ 352.005414] sch_tbf [ 352.008277] cls_matchall [ 352.010619] sch_ingress [ 352.012789] cls_fw [ 352.015392] sch_htb [ 352.017909] cls_flow [ 352.019991] sch_hfsc [ 352.022162] cls_basic [ 352.024417] em_u32 [ 352.026674] act_skbedit [ 352.029016] cls_u32 [ 352.031100] act_mirred [ 352.033616] cls_route [ 352.035785] act_gact [ 352.038214] cls_matchall [ 352.040557] ip6_gre [ 352.042813] cls_fw [ 352.045416] ip_gre [ 352.047586] cls_flow [ 352.049669] gre [ 352.051751] cls_basic [ 352.054008] ifb [ 352.055830] act_skbedit [ 352.058173] ip6_tunnel [ 352.059995] act_mirred [ 352.062511] tunnel6 [ 352.064941] act_gact [ 352.067370] ip_tunnel [ 352.069539] ip6_gre [ 352.071796] vxlan [ 352.074138] ip_gre [ 352.076308] udp_tunnel [ 352.078304] gre [ 352.080387] ip6_udp_tunnel [ 352.082817] ifb [ 352.084640] sha512_arm64 [ 352.087415] ip6_tunnel [ 352.089238] sha1_ce [ 352.091841] tunnel6 [ 352.094271] sha1_generic [ 352.096439] ip_tunnel [ 352.098609] seqiv [ 352.101212] vxlan [ 352.103556] md5 [ 352.105551] udp_tunnel [ 352.107547] geniv [ 352.109370] ip6_udp_tunnel [ 352.111799] des_generic [ 352.113795] sha512_arm64 [ 352.116572] libdes [ 352.119088] sha1_ce [ 352.121691] authencesn [ 352.123773] sha1_generic [ 352.125944] authenc [ 352.128372] seqiv [ 352.130975] leds_gpio [ 352.133145] md5 [ 352.135142] xhci_plat_hcd [ 352.137484] geniv [ 352.139307] xhci_pci [ 352.141996] des_generic [ 352.143993] xhci_mtk_hcd [ 352.146249] libdes [ 352.148766] xhci_hcd [ 352.151369] authencesn [ 352.153452] gpio_button_hotplug(O) [ 352.155708] authenc [ 352.158139] usbcore [ 352.161608] leds_gpio [ 352.163778] usb_common [ 352.165947] xhci_plat_hcd [ 352.168290] aquantia [ 352.170718] xhci_pci [ 352.173408] [ 352.175664] xhci_mtk_hcd [ 352.177923] CPU: 0 PID: 3369 Comm: kworker/u8:6 Tainted: G O 6.6.29 #0 [ 352.179397] xhci_hcd [ 352.182000] Hardware name: GL.iNet GL-MT6000 (DT) [ 352.189889] gpio_button_hotplug(O) [ 352.192146] Workqueue: mt76 mt7915_mac_reset_work [mt7915e] [ 352.196827] usbcore [ 352.200297] [ 352.205847] usb_common [ 352.208017] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 352.209493] aquantia [ 352.211923] pc : kthread_park+0x9c/0xb0 [ 352.218858] [ 352.218860] CPU: 2 PID: 244 Comm: kworker/u8:3 Tainted: G O 6.6.29 #0 [ 352.221115] lr : mt7915_mac_reset_work+0x128/0xd20 [mt7915e] [ 352.224931] Hardware name: GL.iNet GL-MT6000 (DT) [ 352.226407] sp : ffffffc084f2bca0 [ 352.234211] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 352.239846] x29: ffffffc084f2bca0 [ 352.244529] [ 352.247824] x28: 0000000000000000 [ 352.253807] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 352.257103] x27: ffffff8006f5fa20 [ 352.258580] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 352.261963] [ 352.261963] x26: ffffff8001d5a680 [ 352.268899] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 352.272281] x25: ffffff8001d5a000 [ 352.278871] sp : ffffffc0811abc80 [ 352.280347] x24: ffffff8006f52000 [ 352.283643] x29: ffffffc0811abc80 [ 352.290232] [ 352.290233] x23: ffffffffffff25e0 [ 352.293615] x28: 0000000000000001 [ 352.296912] x22: ffffff8000876e00 [ 352.300294] x27: ffffff80129f26c0 [ 352.303591] x21: ffffff8000011000 [ 352.305067] [ 352.305068] x26: ffffff800b0203c0 [ 352.308363] [ 352.311746] x25: ffffff8001d588a0 [ 352.315129] x20: ffffff8000fa0e80 [ 352.318511] x24: ffffff8001d588a0 [ 352.321894] x19: ffffff8001f03980 [ 352.323371] [ 352.323372] x23: ffffffc0790b1d10 [ 352.326667] x18: 000000000000023c [ 352.328143] x22: ffffff80102c21f0 [ 352.331526] [ 352.331527] x17: 0000000000000000 [ 352.334822] x21: 0000000000000001 [ 352.338206] x16: 0000000000000000 [ 352.341588] [ 352.341589] x20: ffffff80129f26c0 [ 352.343064] x15: ffffffc080b6a128 [ 352.346360] x19: ffffff800b020000 [ 352.349743] [ 352.349743] x14: 0000000000000000 [ 352.353125] x18: 0000000000000005 [ 352.354601] x13: 0000000000000020 [ 352.357897] [ 352.357898] x17: 0000000000000000 [ 352.361280] x12: 0101010101010101 [ 352.364664] x16: 0000000000000078 [ 352.366140] [ 352.366141] x11: 7f7f7f7f7f7f7f7f [ 352.369437] x15: 0000000000000006 [ 352.372820] x10: fefefefefefefeff [ 352.376203] [ 352.376204] x14: 0000000000000000 [ 352.377679] x9 : 7f7f7f7f7f7f7f7f [ 352.380976] x13: 3a6e692064656b6e [ 352.384359] [ 352.384360] x8 : ffffffffffff6400 [ 352.387742] x12: 696c2073656c7564 [ 352.389219] x7 : 0000000000000800 [ 352.392514] [ 352.392515] x11: 00000000fffff240 [ 352.395897] x6 : 0000000000000000 [ 352.399279] x10: 000000000000005d [ 352.400756] [ 352.400757] x5 : 0000000000000000 [ 352.404052] x9 : 000000000009050d [ 352.407435] x4 : ffffff803fd87d80 [ 352.410818] [ 352.410819] x8 : 0000000000000002 [ 352.412295] x3 : ffffff8001d5a2b8 [ 352.415592] x7 : 000000000000b8c9 [ 352.418974] [ 352.418975] x2 : 0000000000000011 [ 352.422357] x6 : 000000000a099f42 [ 352.423834] x1 : ffffffc080b67488 [ 352.427130] [ 352.427131] x5 : 0000000001000000 [ 352.430514] x0 : 0000000000000004 [ 352.433897] x4 : 0000000000000000 [ 352.435373] [ 352.435374] Call trace: [ 352.438670] x3 : 0000000000000000 [ 352.442053] kthread_park+0x9c/0xb0 [ 352.445435] [ 352.445436] x2 : 0000000000000001 [ 352.446912] mt7915_mac_reset_work+0x128/0xd20 [mt7915e] [ 352.450207] x1 : 0000000000000002 [ 352.453591] process_one_work+0x154/0x2a0 [ 352.456974] x0 : 00000000ffffff92 [ 352.458450] worker_thread+0x2a8/0x484 [ 352.461746] [ 352.461747] Call trace: [ 352.465128] kthread+0xdc/0xe8 [ 352.468512] ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 352.469988] ret_from_fork+0x10/0x20 [ 352.473284] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 352.476667] ---[ end trace 0000000000000000 ]--- [ 352.480049] process_one_work+0x154/0x2a0 [ 352.557950] worker_thread+0x2a8/0x484 [ 352.561688] kthread+0xdc/0xe8 [ 352.564732] ret_from_fork+0x10/0x20 [ 352.568295] ---[ end trace 0000000000000000 ]--- ```

does this give perhaps some more input where it could crash?

lukasz1992 commented 5 months ago

Looks like mcu does not like messages 26 and 5A sent together. https://github.com/rany2/openwrt/commit/18cc739263004d4846991c9afbc6ba45c39293a1 should solve the issue.

xize commented 5 months ago

Looks like mcu does not like messages 26 and 5A sent together. rany2/openwrt@18cc739 should solve the issue.

@rany2

checked this patch again with a dirclean, though the patch still crash but less than what I was having without this patch I could be mistaken since the randomness it happens, a new time out message appeared though mt798x-wmac 18000000.wifi: Message 000800c4 (seq 7) timeout

new stacktrace:

``` [ 1598.274422] mt798x-wmac 18000000.wifi: Message 000026ed (seq 4) timeout [ 3983.506731] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout [ 4003.975742] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 7) timeout [ 4024.423310] mt798x-wmac 18000000.wifi: Message 000025ed (seq 8) timeout [ 4024.429978] ------------[ cut here ]------------ [ 4024.434580] WARNING: CPU: 1 PID: 16700 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 4024.443394] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact [ 4024.443554] ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia [ 4024.558530] CPU: 1 PID: 16700 Comm: kworker/u8:5 Tainted: G O 6.6.29 #0 [ 4024.566511] Hardware name: GL.iNet GL-MT6000 (DT) [ 4024.571197] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 4024.577210] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 4024.584150] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 4024.590766] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 4024.597381] sp : ffffffc08844bc80 [ 4024.600679] x29: ffffffc08844bc80 x28: 0000000000000001 x27: ffffff800e669600 [ 4024.607795] x26: ffffff8006c6a3d0 x25: ffffff80077508a0 x24: ffffff80077508a0 [ 4024.614911] x23: ffffffc0790bed10 x22: ffffff8006c6e400 x21: 0000000000000001 [ 4024.622027] x20: ffffff800e669600 x19: ffffff8006c6a000 x18: 000000000000018d [ 4024.629143] x17: 0000000000000000 x16: 0000000000000000 x15: ffffffc080b6a128 [ 4024.636258] x14: 00000000000004a7 x13: 000000000000018d x12: 00000000ffffffea [ 4024.643373] x11: 00000000ffffefff x10: ffffffc080bc2128 x9 : ffffffc080b6a0d0 [ 4024.650490] x8 : 0000000000017fe8 x7 : c0000000ffffefff x6 : 000000000a05d9eb [ 4024.657605] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 4024.664720] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000ffffff92 [ 4024.671836] Call trace: [ 4024.674268] ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 4024.680537] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 4024.686285] process_one_work+0x154/0x2a0 [ 4024.690285] worker_thread+0x2a8/0x484 [ 4024.694019] kthread+0xdc/0xe8 [ 4024.697060] ret_from_fork+0x10/0x20 [ 4024.700622] ---[ end trace 0000000000000000 ]--- ```
lukasz1992 commented 5 months ago

Looks like mcu does not like messages 26 and 5A sent together. rany2/openwrt@18cc739 should solve the issue.

@rany2

checked this patch again with a dirclean, though the patch still crash but less than what I was having without this patch I could be mistaken since the randomness it happens, a new time out message appeared though mt798x-wmac 18000000.wifi: Message 000800c4 (seq 7) timeout

new stacktrace:

Please apply this patch too: https://[pastebin.com/raw/cyn8YQ4R](https://pastebin.com/raw/cyn8YQ4R)

rany2 commented 5 months ago

@xize I had look at your tree and you didn't apply my patch properly. I'm not sure sure why you did it like that: https://github.com/xize/openwrt-flint2-testing/commit/1c16923264f59d7dbe5be1d1fb5f609a6431cd52

The patch file @lukasz1992 linked to is already a patch file, so just download that file to your tree in mt76 patches folder. It should be against mt76 not your openwrt tree.

Fail-Safe commented 5 months ago

Please apply this patch too: https://[pastebin.com/raw/cyn8YQ4R](https://pastebin.com/raw/cyn8YQ4R)

@lukasz1992 && @rany2 I have applied this patch and encountered the following error after re-enabling multicast_to_unicast_all:

root@AP:~# cat /sys/fs/pstore/dmesg-ramoops-0
Oops#1 Part1
<6>[   22.439909] mt798x-wmac 18000000.wifi phy0-ap2: entered promiscuous mode
<6>[   22.448851] br-lan: port 10(phy0-ap2) entered blocking state
<6>[   22.454538] br-lan: port 10(phy0-ap2) entered forwarding state
<6>[   23.246394] br-lan: port 7(phy1-ap0) entered blocking state
<6>[   23.251996] br-lan: port 7(phy1-ap0) entered forwarding state
<6>[   23.371730] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.377404] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.383131] mt798x-wmac 18000000.wifi phy1-ap1: entered allmulticast mode
<6>[   23.390190] mt798x-wmac 18000000.wifi phy1-ap1: entered promiscuous mode
<6>[   23.400229] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.405884] br-lan: port 11(phy1-ap1) entered forwarding state
<6>[   23.413659] mt798x-wmac 18000000.wifi phy1-ap1: left allmulticast mode
<6>[   23.420233] mt798x-wmac 18000000.wifi phy1-ap1: left promiscuous mode
<6>[   23.426771] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.501383] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.507043] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.512736] mt798x-wmac 18000000.wifi phy1-ap1: entered allmulticast mode
<6>[   23.519686] mt798x-wmac 18000000.wifi phy1-ap1: entered promiscuous mode
<6>[   23.526484] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.532137] br-lan: port 11(phy1-ap1) entered forwarding state
<3>[  105.908186] mt798x-wmac 18000000.wifi phy0-ap2: failed (err=-2) to del object (id=3)
<3>[  105.915931] mt798x-wmac 18000000.wifi phy1-ap1: failed (err=-2) to del object (id=3)
<6>[  217.633996] br-lan: port 11(phy1-ap1) entered disabled state
<1>[  217.702985] Unable to handle kernel paging request at virtual address 9ae1ed4c37e60181
<1>[  217.710908] Mem abort info:
<1>[  217.713727]   ESR = 0x0000000096000004
<1>[  217.717460]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[  217.722762]   SET = 0, FnV = 0
<1>[  217.725802]   EA = 0, S1PTW = 0
<1>[  217.728926]   FSC = 0x04: level 0 translation fault
<1>[  217.733808] Data abort info:
<1>[  217.736754]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
<1>[  217.742223]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[  217.747267]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[  217.752560] [9ae1ed4c37e60181] address between user and kernel address ranges
<0>[  217.759684] Internal error: Oops: 0000000096000004 [#1] SMP
<7>[  217.765238] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
<7>[  217.765408]  gpio_button_hotplug(O) usbcore usb_common aquantia
<7>[  217.860492] CPU: 3 PID: 1665 Comm: hostapd Tainted: G           O       6.6.28 #0
<7>[  217.867954] Hardware name: GL.iNet GL-MT6000 (DT)
<7>[  217.872640] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
<7>[  217.879580] pc : mtk_wed_setup_tc_block_cb+0x4/0x38
<7>[  217.884449] lr : tc_setup_cb_reoffload+0x30/0x134
<7>[  217.889140] sp : ffffffc0814c3360
<7>[  217.892438] x29: ffffffc0814c3360 x28: ffffffc080b48000 x27: 0000000000000000
<7>[  217.899554] x26: ffffff800616b000 x25: 0000000000000000 x24: 0000000000000000
<7>[  217.906669] x23: ffffff8006454840 x22: ffffff800616b000 x21: ffffff800a6fb3ec
<7>[  217.913784] x20: 0000000000000000 x19: ffffff800cabb280 x18: 0000000000000028
<7>[  217.920900] x17: 0000000000000000 x16: 0000000000001978 x15: 0000000000000a30
<7>[  217.928014] x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000002
<7>[  217.935130] x11: 0000000000000303 x10: 00000000000008a0 x9 : ffffffc0814c35f0
<7>[  217.942246] x8 : 0000000000000001 x7 : ffffff800a6fb3ec x6 : ffffff8006454840
<7>[  217.949361] x5 : ffffffc0814c3408 x4 : 9ae1ed4c37e60091 x3 : ffffff8001350e40
<7>[  217.956477] x2 : ffffff8006454840 x1 : ffffffc0814c3408 x0 : 0000000000000005
<7>[  217.963593] Call trace:
<7>[  217.966025]  mtk_wed_setup_tc_block_cb+0x4/0x38
<7>[  217.970539]  0xffffffc078e054ac
<7>[  217.973700]  tcf_block_playback_offloads+0x70/0x1e8
<7>[  217.978562]  tcf_block_unbind+0x6c/0xc8
<7>[  217.982384]  tcf_block_setup+0x38/0x1e4
<7>[  217.986205]  tcf_block_offload_cmd.isra.0+0xdc/0x128
<7>[  217.991152]  tcf_block_offload_unbind+0x50/0x8c
<7>[  217.995667]  __tcf_block_put+0x88/0x17c
<7>[  217.999488]  tcf_block_put_ext+0x4c/0x60
<7>[  218.003395]  0xffffffc078de49ac
<7>[  218.006532]  __qdisc_destroy+0x40/0xa0
<7>[  218.010266]  qdisc_put+0x54/0x6c
<7>[  218.013479]  dev_shutdown+0x90/0x108
<7>[  218.017038]  unregister_netdevice_many_notify+0x1cc/0x788
<7>[  218.022422]  unregister_netdevice_queue+0xa4/0xb0
<7>[  218.027111]  cfg80211_shutdown_all_interfaces+0x32c/0x37c [cfg80211]
<7>[  218.033465]  cfg80211_unregister_wdev+0x10/0x18 [cfg80211]
<7>[  218.038946]  ieee80211_if_remove+0x6c/0x110 [mac80211]
<7>[  218.044102]  ieee80211_channel_switch_disconnect+0x1cfc/0x1d08 [mac80211]
<7>[  218.050891]  cfg80211_remove_virtual_intf+0x5c/0x68 [cfg80211]
<7>[  218.056720]  cfg80211_check_station_change+0x31ac/0x32c4 [cfg80211]
<7>[  218.062981]  genl_family_rcv_msg_doit+0xa8/0x108
<7>[  218.067584]  genl_rcv_msg+0x1b0/0x244
<7>[  218.071231]  netlink_rcv_skb+0x54/0x11c
<7>[  218.075051]  genl_rcv+0x34/0x48
<7>[  218.078178]  netlink_unicast+0x1e0/0x2c8
<7>[  218.082085]  netlink_sendmsg+0x198/0x3c4
<7>[  218.085992]  ____sys_sendmsg+0x1bc/0x26c
<7>[  218.089905]  ___sys_sendmsg+0x78/0xb8
<7>[  218.093552]  __sys_sendmsg+0x44/0x98
<7>[  218.097111]  __arm64_sys_sendmsg+0x20/0x28
<7>[  218.101192]  invoke_syscall.constprop.0+0x4c/0xe0
<7>[  218.105882]  do_el0_svc+0x3c/0xbc
<7>[  218.109182]  el0_svc+0x18/0x4c
<7>[  218.112225]  el0t_64_sync_handler+0x118/0x124
<7>[  218.116567]  el0t_64_sync+0x150/0x154
<0>[  218.120218] Code: b9401fe0 a8c27bfd d65f03c0 a9401043 (f9407882)
<4>[  218.126291] ---[ end trace 0000000000000000 ]---

root@AP:~# cat /sys/fs/pstore/dmesg-ramoops-1
Panic#2 Part1
<6>[   23.251996] br-lan: port 7(phy1-ap0) entered forwarding state
<6>[   23.371730] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.377404] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.383131] mt798x-wmac 18000000.wifi phy1-ap1: entered allmulticast mode
<6>[   23.390190] mt798x-wmac 18000000.wifi phy1-ap1: entered promiscuous mode
<6>[   23.400229] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.405884] br-lan: port 11(phy1-ap1) entered forwarding state
<6>[   23.413659] mt798x-wmac 18000000.wifi phy1-ap1: left allmulticast mode
<6>[   23.420233] mt798x-wmac 18000000.wifi phy1-ap1: left promiscuous mode
<6>[   23.426771] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.501383] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.507043] br-lan: port 11(phy1-ap1) entered disabled state
<6>[   23.512736] mt798x-wmac 18000000.wifi phy1-ap1: entered allmulticast mode
<6>[   23.519686] mt798x-wmac 18000000.wifi phy1-ap1: entered promiscuous mode
<6>[   23.526484] br-lan: port 11(phy1-ap1) entered blocking state
<6>[   23.532137] br-lan: port 11(phy1-ap1) entered forwarding state
<3>[  105.908186] mt798x-wmac 18000000.wifi phy0-ap2: failed (err=-2) to del object (id=3)
<3>[  105.915931] mt798x-wmac 18000000.wifi phy1-ap1: failed (err=-2) to del object (id=3)
<6>[  217.633996] br-lan: port 11(phy1-ap1) entered disabled state
<1>[  217.702985] Unable to handle kernel paging request at virtual address 9ae1ed4c37e60181
<1>[  217.710908] Mem abort info:
<1>[  217.713727]   ESR = 0x0000000096000004
<1>[  217.717460]   EC = 0x25: DABT (current EL), IL = 32 bits
<1>[  217.722762]   SET = 0, FnV = 0
<1>[  217.725802]   EA = 0, S1PTW = 0
<1>[  217.728926]   FSC = 0x04: level 0 translation fault
<1>[  217.733808] Data abort info:
<1>[  217.736754]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
<1>[  217.742223]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
<1>[  217.747267]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
<1>[  217.752560] [9ae1ed4c37e60181] address between user and kernel address ranges
<0>[  217.759684] Internal error: Oops: 0000000096000004 [#1] SMP
<7>[  217.765238] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
<7>[  217.765408]  gpio_button_hotplug(O) usbcore usb_common aquantia
<7>[  217.860492] CPU: 3 PID: 1665 Comm: hostapd Tainted: G           O       6.6.28 #0
<7>[  217.867954] Hardware name: GL.iNet GL-MT6000 (DT)
<7>[  217.872640] pstate: 80400005 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
<7>[  217.879580] pc : mtk_wed_setup_tc_block_cb+0x4/0x38
<7>[  217.884449] lr : tc_setup_cb_reoffload+0x30/0x134
<7>[  217.889140] sp : ffffffc0814c3360
<7>[  217.892438] x29: ffffffc0814c3360 x28: ffffffc080b48000 x27: 0000000000000000
<7>[  217.899554] x26: ffffff800616b000 x25: 0000000000000000 x24: 0000000000000000
<7>[  217.906669] x23: ffffff8006454840 x22: ffffff800616b000 x21: ffffff800a6fb3ec
<7>[  217.913784] x20: 0000000000000000 x19: ffffff800cabb280 x18: 0000000000000028
<7>[  217.920900] x17: 0000000000000000 x16: 0000000000001978 x15: 0000000000000a30
<7>[  217.928014] x14: 0000000000000000 x13: 0000000000000030 x12: 0000000000000002
<7>[  217.935130] x11: 0000000000000303 x10: 00000000000008a0 x9 : ffffffc0814c35f0
<7>[  217.942246] x8 : 0000000000000001 x7 : ffffff800a6fb3ec x6 : ffffff8006454840
<7>[  217.949361] x5 : ffffffc0814c3408 x4 : 9ae1ed4c37e60091 x3 : ffffff8001350e40
<7>[  217.956477] x2 : ffffff8006454840 x1 : ffffffc0814c3408 x0 : 0000000000000005
<7>[  217.963593] Call trace:
<7>[  217.966025]  mtk_wed_setup_tc_block_cb+0x4/0x38
<7>[  217.970539]  0xffffffc078e054ac
<7>[  217.973700]  tcf_block_playback_offloads+0x70/0x1e8
<7>[  217.978562]  tcf_block_unbind+0x6c/0xc8
<7>[  217.982384]  tcf_block_setup+0x38/0x1e4
<7>[  217.986205]  tcf_block_offload_cmd.isra.0+0xdc/0x128
<7>[  217.991152]  tcf_block_offload_unbind+0x50/0x8c
<7>[  217.995667]  __tcf_block_put+0x88/0x17c
<7>[  217.999488]  tcf_block_put_ext+0x4c/0x60
<7>[  218.003395]  0xffffffc078de49ac
<7>[  218.006532]  __qdisc_destroy+0x40/0xa0
<7>[  218.010266]  qdisc_put+0x54/0x6c
<7>[  218.013479]  dev_shutdown+0x90/0x108
<7>[  218.017038]  unregister_netdevice_many_notify+0x1cc/0x788
<7>[  218.022422]  unregister_netdevice_queue+0xa4/0xb0
<7>[  218.027111]  cfg80211_shutdown_all_interfaces+0x32c/0x37c [cfg80211]
<7>[  218.033465]  cfg80211_unregister_wdev+0x10/0x18 [cfg80211]
<7>[  218.038946]  ieee80211_if_remove+0x6c/0x110 [mac80211]
<7>[  218.044102]  ieee80211_channel_switch_disconnect+0x1cfc/0x1d08 [mac80211]
<7>[  218.050891]  cfg80211_remove_virtual_intf+0x5c/0x68 [cfg80211]
<7>[  218.056720]  cfg80211_check_station_change+0x31ac/0x32c4 [cfg80211]
<7>[  218.062981]  genl_family_rcv_msg_doit+0xa8/0x108
<7>[  218.067584]  genl_rcv_msg+0x1b0/0x244
<7>[  218.071231]  netlink_rcv_skb+0x54/0x11c
<7>[  218.075051]  genl_rcv+0x34/0x48
<7>[  218.078178]  netlink_unicast+0x1e0/0x2c8
<7>[  218.082085]  netlink_sendmsg+0x198/0x3c4
<7>[  218.085992]  ____sys_sendmsg+0x1bc/0x26c
<7>[  218.089905]  ___sys_sendmsg+0x78/0xb8
<7>[  218.093552]  __sys_sendmsg+0x44/0x98
<7>[  218.097111]  __arm64_sys_sendmsg+0x20/0x28
<7>[  218.101192]  invoke_syscall.constprop.0+0x4c/0xe0
<7>[  218.105882]  do_el0_svc+0x3c/0xbc
<7>[  218.109182]  el0_svc+0x18/0x4c
<7>[  218.112225]  el0t_64_sync_handler+0x118/0x124
<7>[  218.116567]  el0t_64_sync+0x150/0x154
<0>[  218.120218] Code: b9401fe0 a8c27bfd d65f03c0 a9401043 (f9407882)
<4>[  218.126291] ---[ end trace 0000000000000000 ]---
<3>[  218.134291] pstore: backend (ramoops) writing error (-28)
<0>[  218.139676] Kernel panic - not syncing: Oops: Fatal exception
<2>[  218.145402] SMP: stopping secondary CPUs
<0>[  218.149310] Kernel Offset: disabled
<0>[  218.152782] CPU features: 0x0,00000000,00000000,1000400b
<0>[  218.158075] Memory Limit: none

Was there another patch that I missed?


Update: Grabbing this patch to include in my build now... https://github.com/rany2/openwrt/commit/18cc739263004d4846991c9afbc6ba45c39293a1

xize commented 5 months ago

@rany2 hmm im very new to patching i decided to use quilt now but I do get this error from the patch:


Applying patch 9004-wifi-mt76-mt7915-do-not-use-event-format-to-get-.patch
patching file mt76_connac_mcu.h
Hunk #1 FAILED at 1216.
1 out of 1 hunk FAILED -- rejects in file mt76_connac_mcu.h
patching file mt7915/init.c
Hunk #1 succeeded at 515 (offset 21 lines).
patching file mt7915/mac.c
Hunk #1 succeeded at 1146 (offset -64 lines).
Hunk #2 succeeded at 1256 (offset -23 lines).
patching file mt7915/mcu.c
Hunk #1 succeeded at 3086 (offset -261 lines).
patching file mt7915/mcu.h
Hunk #1 succeeded at 163 (offset -100 lines).
patching file mt7915/mt7915.h
Hunk #1 succeeded at 495 with fuzz 1 (offset -175 lines).
patching file mt7915/regs.h
Hunk #1 succeeded at 311 (offset -12 lines).
Hunk #2 succeeded at 415 (offset -12 lines).
Hunk #3 succeeded at 564 (offset -19 lines).
Hunk #4 succeeded at 579 (offset -19 lines).
Patch 9004-wifi-mt76-mt7915-do-not-use-event-format-to-get-.patch does not apply (enforce with -f)
make[2]: *** [Makefile:636: /home/xize/openwrt-flint2-testing/build_dir/target-aarch64_cortex-a53_musl/linux-mediatek_filogic/mt76-2024.04.03~1e336a85/.quilt_checked] Error 1                                           

some guidance would be excellent, or do i need to use this patch somewhere else like in the mt76 itself?, maybe it is because I use kernel 6.6 in my builds.

rany2 commented 5 months ago

@xize This patch should apply cleanly for the regular mt76 repository, it didn't apply cleanly for you because of other patches in my repo that were built on top of this:

Patch

```diff From e5b4c0323eb5575a4531d3967d12fb3ba6d835ea Mon Sep 17 00:00:00 2001 From: rany Date: Fri, 26 May 2023 19:41:12 +0300 Subject: [PATCH] wifi: mt76: mt7915: do not use event format to get survey data Using the event format to get survey data results in the chip eventually crashing with no chance of recovery. The only possible course of action is to restart the system and hope it never happens again. Signed-off-by: rany --- mt76_connac_mcu.h | 1 - mt7915/init.c | 1 + mt7915/mac.c | 40 ++++++++++++++++++++++---- mt7915/mcu.c | 72 ----------------------------------------------- mt7915/mcu.h | 21 -------------- mt7915/mt7915.h | 1 - mt7915/regs.h | 13 +++++++++ 7 files changed, 49 insertions(+), 100 deletions(-) diff --git a/mt76_connac_mcu.h b/mt76_connac_mcu.h index 915eb3a1..3e92f3f3 100644 --- a/mt76_connac_mcu.h +++ b/mt76_connac_mcu.h @@ -1218,7 +1218,6 @@ enum { MCU_EXT_CMD_EFUSE_FREE_BLOCK = 0x4f, MCU_EXT_CMD_TX_POWER_FEATURE_CTRL = 0x58, MCU_EXT_CMD_RXDCOC_CAL = 0x59, - MCU_EXT_CMD_GET_MIB_INFO = 0x5a, MCU_EXT_CMD_TXDPD_CAL = 0x60, MCU_EXT_CMD_CAL_CACHE = 0x67, MCU_EXT_CMD_RED_ENABLE = 0x68, diff --git a/mt7915/init.c b/mt7915/init.c index eee18798..bffc94e3 100644 --- a/mt7915/init.c +++ b/mt7915/init.c @@ -515,6 +515,7 @@ mt7915_mac_init_band(struct mt7915_dev *dev, u8 band) mask = MT_WF_RMAC_MIB_OBSS_BACKOFF | MT_WF_RMAC_MIB_ED_OFFSET; set = FIELD_PREP(MT_WF_RMAC_MIB_OBSS_BACKOFF, 0) | FIELD_PREP(MT_WF_RMAC_MIB_ED_OFFSET, 4); + mt76_rmw(dev, MT_WF_RMAC_MIB_TIME0(band), mask, set); mt76_rmw(dev, MT_WF_RMAC_MIB_AIRTIME0(band), mask, set); /* filter out non-resp frames and get instanstaeous signal reporting */ diff --git a/mt7915/mac.c b/mt7915/mac.c index 8008ce3f..4cda0938 100644 --- a/mt7915/mac.c +++ b/mt7915/mac.c @@ -1146,10 +1146,14 @@ void mt7915_mac_reset_counters(struct mt7915_phy *phy) memset(phy->mt76->aggr_stats, 0, sizeof(phy->mt76->aggr_stats)); /* reset airtime counters */ + mt76_rr(dev, MT_MIB_SDR9(phy->mt76->band_idx)); + mt76_rr(dev, MT_MIB_SDR36(phy->mt76->band_idx)); + mt76_rr(dev, MT_MIB_SDR37(phy->mt76->band_idx)); + + mt76_set(dev, MT_WF_RMAC_MIB_TIME0(phy->mt76->band_idx), + MT_WF_RMAC_MIB_RXTIME_CLR); mt76_set(dev, MT_WF_RMAC_MIB_AIRTIME0(phy->mt76->band_idx), MT_WF_RMAC_MIB_RXTIME_CLR); - - mt7915_mcu_get_chan_mib_info(phy, true); } void mt7915_mac_set_timing(struct mt7915_phy *phy) @@ -1252,23 +1256,49 @@ mt7915_phy_get_nf(struct mt7915_phy *phy, int idx) return sum / n; } -void mt7915_update_channel(struct mt76_phy *mphy) +static void +mt7915_phy_update_channel(struct mt76_phy *mphy, u8 idx) { + struct mt7915_dev *dev = container_of(mphy->dev, struct mt7915_dev, mt76); struct mt7915_phy *phy = mphy->priv; struct mt76_channel_state *state = mphy->chan_state; + u64 busy_time, tx_time, rx_time, obss_time; int nf; - mt7915_mcu_get_chan_mib_info(phy, false); + busy_time = mt76_get_field(dev, MT_MIB_SDR9(idx), + MT_MIB_SDR9_BUSY_MASK); + tx_time = mt76_get_field(dev, MT_MIB_SDR36(idx), + MT_MIB_SDR36_TXTIME_MASK); + rx_time = mt76_get_field(dev, MT_MIB_SDR37(idx), + MT_MIB_SDR37_RXTIME_MASK); + obss_time = mt76_get_field(dev, MT_WF_RMAC_MIB_AIRTIME14(idx), + MT_MIB_OBSSTIME_MASK); - nf = mt7915_phy_get_nf(phy, phy->mt76->band_idx); + nf = mt7915_phy_get_nf(phy, idx); if (!phy->noise) phy->noise = nf << 4; else if (nf) phy->noise += nf - (phy->noise >> 4); + state->cc_busy += busy_time; + state->cc_tx += tx_time; + state->cc_rx += rx_time + obss_time; + state->cc_bss_rx += rx_time; state->noise = -(phy->noise >> 4); } +void mt7915_update_channel(struct mt76_phy *mphy) +{ + struct mt7915_phy *phy = (struct mt7915_phy *)mphy->priv; + struct mt7915_dev *dev = phy->dev; + + mt7915_phy_update_channel(mphy, phy->mt76->band_idx); + + /* reset obss airtime */ + mt76_set(dev, MT_WF_RMAC_MIB_TIME0(phy->mt76->band_idx), + MT_WF_RMAC_MIB_RXTIME_CLR); +} + static bool mt7915_wait_reset_state(struct mt7915_dev *dev, u32 state) { diff --git a/mt7915/mcu.c b/mt7915/mcu.c index 29e9d660..8f0c3f36 100644 --- a/mt7915/mcu.c +++ b/mt7915/mcu.c @@ -3086,78 +3086,6 @@ int mt7915_mcu_apply_tx_dpd(struct mt7915_phy *phy) return 0; } -int mt7915_mcu_get_chan_mib_info(struct mt7915_phy *phy, bool chan_switch) -{ - struct mt76_channel_state *state = phy->mt76->chan_state; - struct mt76_channel_state *state_ts = &phy->state_ts; - struct mt7915_dev *dev = phy->dev; - struct mt7915_mcu_mib *res, req[5]; - struct sk_buff *skb; - static const u32 *offs; - int i, ret, len, offs_cc; - u64 cc_tx; - - /* strict order */ - if (is_mt7915(&dev->mt76)) { - static const u32 chip_offs[] = { - MIB_NON_WIFI_TIME, - MIB_TX_TIME, - MIB_RX_TIME, - MIB_OBSS_AIRTIME, - MIB_TXOP_INIT_COUNT, - }; - len = ARRAY_SIZE(chip_offs); - offs = chip_offs; - offs_cc = 20; - } else { - static const u32 chip_offs[] = { - MIB_NON_WIFI_TIME_V2, - MIB_TX_TIME_V2, - MIB_RX_TIME_V2, - MIB_OBSS_AIRTIME_V2 - }; - len = ARRAY_SIZE(chip_offs); - offs = chip_offs; - offs_cc = 0; - } - - for (i = 0; i < len; i++) { - req[i].band = cpu_to_le32(phy->mt76->band_idx); - req[i].offs = cpu_to_le32(offs[i]); - } - - ret = mt76_mcu_send_and_get_msg(&dev->mt76, MCU_EXT_CMD(GET_MIB_INFO), - req, len * sizeof(req[0]), true, &skb); - if (ret) - return ret; - - res = (struct mt7915_mcu_mib *)(skb->data + offs_cc); - -#define __res_u64(s) le64_to_cpu(res[s].data) - /* subtract Tx backoff time from Tx duration */ - cc_tx = is_mt7915(&dev->mt76) ? __res_u64(1) - __res_u64(4) : __res_u64(1); - - if (chan_switch) - goto out; - - state->cc_tx += cc_tx - state_ts->cc_tx; - state->cc_bss_rx += __res_u64(2) - state_ts->cc_bss_rx; - state->cc_rx += __res_u64(2) + __res_u64(3) - state_ts->cc_rx; - state->cc_busy += __res_u64(0) + cc_tx + __res_u64(2) + __res_u64(3) - - state_ts->cc_busy; - -out: - state_ts->cc_tx = cc_tx; - state_ts->cc_bss_rx = __res_u64(2); - state_ts->cc_rx = __res_u64(2) + __res_u64(3); - state_ts->cc_busy = __res_u64(0) + cc_tx + __res_u64(2) + __res_u64(3); -#undef __res_u64 - - dev_kfree_skb(skb); - - return 0; -} - int mt7915_mcu_get_temperature(struct mt7915_phy *phy) { struct mt7915_dev *dev = phy->dev; diff --git a/mt7915/mcu.h b/mt7915/mcu.h index b41ac4aa..c8e5fd9d 100644 --- a/mt7915/mcu.h +++ b/mt7915/mcu.h @@ -163,27 +163,6 @@ struct mt7915_mcu_phy_rx_info { u8 bw; }; -struct mt7915_mcu_mib { - __le32 band; - __le32 offs; - __le64 data; -} __packed; - -enum mt7915_chan_mib_offs { - /* mt7915 */ - MIB_TX_TIME = 81, - MIB_RX_TIME, - MIB_OBSS_AIRTIME = 86, - MIB_NON_WIFI_TIME, - MIB_TXOP_INIT_COUNT, - - /* mt7916 */ - MIB_TX_TIME_V2 = 6, - MIB_RX_TIME_V2 = 8, - MIB_OBSS_AIRTIME_V2 = 490, - MIB_NON_WIFI_TIME_V2 -}; - struct mt7915_mcu_txpower_sku { u8 format_id; u8 limit_type; diff --git a/mt7915/mt7915.h b/mt7915/mt7915.h index a30d08eb..2e02a528 100644 --- a/mt7915/mt7915.h +++ b/mt7915/mt7915.h @@ -495,7 +495,6 @@ int mt7915_mcu_set_radar_th(struct mt7915_dev *dev, int index, int mt7915_mcu_set_muru_ctrl(struct mt7915_dev *dev, u32 cmd, u32 val); int mt7915_mcu_apply_group_cal(struct mt7915_dev *dev); int mt7915_mcu_apply_tx_dpd(struct mt7915_phy *phy); -int mt7915_mcu_get_chan_mib_info(struct mt7915_phy *phy, bool chan_switch); int mt7915_mcu_get_temperature(struct mt7915_phy *phy); int mt7915_mcu_set_thermal_throttling(struct mt7915_phy *phy, u8 state); int mt7915_mcu_set_thermal_protect(struct mt7915_phy *phy); diff --git a/mt7915/regs.h b/mt7915/regs.h index 89ac8e67..8a0a7ff4 100644 --- a/mt7915/regs.h +++ b/mt7915/regs.h @@ -311,6 +311,9 @@ enum offs_rev { #define MT_MIB_SDR3_FCS_ERR_MASK GENMASK(15, 0) #define MT_MIB_SDR3_FCS_ERR_MASK_MT7916 GENMASK(31, 16) +#define MT_MIB_SDR9(_band) MT_WF_MIB(_band, 0x02c) +#define MT_MIB_SDR9_BUSY_MASK GENMASK(23, 0) + #define MT_MIB_SDR4(_band) MT_WF_MIB(_band, __OFFS(MIB_SDR4)) #define MT_MIB_SDR4_RX_FIFO_FULL_MASK GENMASK(15, 0) @@ -412,6 +415,11 @@ enum offs_rev { #define MT_MIB_SDR33(_band) MT_WF_MIB(_band, 0x088) #define MT_MIB_SDR33_TX_PKT_IBF_CNT GENMASK(15, 0) +#define MT_MIB_SDR36(_band) MT_WF_MIB(_band, 0x098) +#define MT_MIB_SDR36_TXTIME_MASK GENMASK(23, 0) +#define MT_MIB_SDR37(_band) MT_WF_MIB(_band, 0x09c) +#define MT_MIB_SDR37_RXTIME_MASK GENMASK(23, 0) + #define MT_MIB_SDRMUBF(_band) MT_WF_MIB(_band, __OFFS(MIB_SDRMUBF)) #define MT_MIB_MU_BF_TX_CNT GENMASK(15, 0) @@ -556,6 +564,7 @@ enum offs_rev { #define MT_WF_RMAC_RSVD0(_band) MT_WF_RMAC(_band, 0x02e0) #define MT_WF_RMAC_RSVD0_EIFS_CLR BIT(21) +#define MT_WF_RMAC_MIB_TIME0(_band) MT_WF_RMAC(_band, 0x03c4) #define MT_WF_RMAC_MIB_AIRTIME0(_band) MT_WF_RMAC(_band, 0x0380) #define MT_WF_RMAC_MIB_RXTIME_CLR BIT(31) #define MT_WF_RMAC_MIB_OBSS_BACKOFF GENMASK(15, 0) @@ -570,6 +579,10 @@ enum offs_rev { #define MT_WF_RMAC_MIB_AIRTIME4(_band) MT_WF_RMAC(_band, 0x0390) #define MT_WF_RMAC_MIB_QOS23_BACKOFF GENMASK(31, 0) +#define MT_WF_RMAC_MIB_AIRTIME14(_band) MT_WF_RMAC(_band, 0x03b8) +#define MT_MIB_OBSSTIME_MASK GENMASK(23, 0) +#define MT_WF_RMAC_MIB_AIRTIME0(_band) MT_WF_RMAC(_band, 0x0380) + /* WFDMA0 */ #define MT_WFDMA0_BASE __REG(WFDMA0_ADDR) #define MT_WFDMA0(ofs) (MT_WFDMA0_BASE + (ofs)) -- 2.43.0 ```

Fail-Safe commented 5 months ago

Was there another patch that I missed?

Update: Grabbing this patch to include in my build now... rany2/openwrt@18cc739

I had to rebuild the patch to get it to apply cleanly in my build, but ended up with a clean build and so far things are looking a lot better. I'm not getting any 00005aed or 000026ed timeouts now with multicast_to_unicast_all enabled.

Will continue to let this cook and monitor it for a while to see if things hold up.

Fail-Safe commented 5 months ago

Sheesh... spoke too soon. Just hit this crash:

Wed May  1 13:56:36 2024 kern.err kernel: [ 9466.996682] mt798x-wmac 18000000.wifi: Message 000026ed (seq 14) timeout
Wed May  1 13:56:56 2024 kern.err kernel: [ 9487.444333] mt798x-wmac 18000000.wifi: Message 00002ced (seq 15) timeout
Wed May  1 13:57:16 2024 kern.err kernel: [ 9507.912542] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 1) timeout
...
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.337131] ------------[ cut here ]------------
Wed May  1 13:58:03 2024 kern.warn kernel: [ 9554.341756] WARNING: CPU: 0 PID: 9731 at kthread_park+0x9c/0xb0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.347667] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.347844]  gpio_button_hotplug(O) usbcore usb_common aquantia
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.442932] CPU: 0 PID: 9731 Comm: kworker/u8:0 Tainted: G           O       6.6.28 #0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.450826] Hardware name: GL.iNet GL-MT6000 (DT)
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.455513] Workqueue: mt76 mt7915_mac_reset_work [mt7915e]
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.461098] pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.468038] pc : kthread_park+0x9c/0xb0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.471862] lr : mt7915_mac_reset_work+0x128/0xd28 [mt7915e]
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.477511] sp : ffffffc082d03ca0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.480809] x29: ffffffc082d03ca0 x28: 0000000000000000 x27: ffffff800640fa20
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.487927] x26: ffffff800707a680 x25: ffffff800707a000 x24: ffffff8006402000
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.495044] x23: ffffffffffff25e0 x22: ffffff8000011000 x21: ffffff80008bc400
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.502161] x20: ffffff8001014f80 x19: ffffff8000d82e00 x18: 0000000000000000
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.509277] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.516392] x14: 0000000000000000 x13: 0000000000000020 x12: 0101010101010101
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.523508] x11: 0000000000000040 x10: ffffffc080b57470 x9 : ffffffc080b57468
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.530623] x8 : ffffffffffff6400 x7 : 0000000000000000 x6 : 0000000000000000
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.537739] x5 : ffffff8000401918 x4 : ffffff8000401980 x3 : 0000000000000000
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.544855] x2 : 0000000000000001 x1 : ffffffc080b57488 x0 : 0000000000000004
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.551970] Call trace:
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.554403]  kthread_park+0x9c/0xb0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.557881]  mt7915_mac_reset_work+0x128/0xd28 [mt7915e]
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.563189]  process_one_work+0x154/0x2a0
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.567185]  worker_thread+0x2ac/0x48c
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.570919]  kthread+0xdc/0xe8
Wed May  1 13:58:03 2024 kern.debug kernel: [ 9554.573961]  ret_from_fork+0x10/0x20
Wed May  1 13:58:03 2024 kern.warn kernel: [ 9554.577522] ---[ end trace 0000000000000000 ]---

To be clear, I am building with the following:

xize commented 5 months ago

I can confirm aswell mine crashed too, it did took alot of time though.

I use kernel 6.6.29 with the patch from @lukasz1992 and @rany2.

wireless crash:

``` [ 100.652809] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered allmulticast mode [ 100.660147] mt798x-wmac 18000000.wifi phy1-ap0-aya: entered promiscuous mode [ 100.858972] br-lan: port 7(phy1-ap0) entered blocking state [ 100.864565] br-lan: port 7(phy1-ap0) entered forwarding state [ 100.870558] br-lan: port 13(phy1-ap0-aya) entered blocking state [ 100.876564] br-lan: port 13(phy1-ap0-aya) entered forwarding state [ 6157.180037] mt798x-wmac 18000000.wifi: Message 000026ed (seq 12) timeout [ 6177.637926] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 13) timeout [ 6198.097317] mt798x-wmac 18000000.wifi: Message 000025ed (seq 14) timeout [ 6198.104102] ------------[ cut here ]------------ [ 6198.108704] WARNING: CPU: 1 PID: 17561 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.117520] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact [ 6198.117681] ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia [ 6198.232654] CPU: 1 PID: 17561 Comm: kworker/u8:2 Tainted: G O 6.6.29 #0 [ 6198.240636] Hardware name: GL.iNet GL-MT6000 (DT) [ 6198.245322] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 6198.251341] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6198.258281] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.264896] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 6198.271512] sp : ffffffc08a3ebc80 [ 6198.274812] x29: ffffffc08a3ebc80 x28: 0000000000000001 x27: ffffff800c087480 [ 6198.281928] x26: ffffff8007d003c0 x25: ffffff80046d88a0 x24: ffffff80046d88a0 [ 6198.289044] x23: ffffffc079071d10 x22: ffffff8007d061f0 x21: 0000000000000001 [ 6198.296160] x20: ffffff800c087480 x19: ffffff8007d00000 x18: ffffffffffffc8f8 [ 6198.303276] x17: ffffffffffffc800 x16: 0000000000006838 x15: 00000000000040f8 [ 6198.310392] x14: 0000000000000495 x13: 0000000000000187 x12: 00000000000080f8 [ 6198.317507] x11: 00000000000080f8 x10: 00000000000080f8 x9 : ffffffc080b6a0d0 [ 6198.324624] x8 : 00000000000080f8 x7 : 0000000000000000 x6 : 0000000d5657a665 [ 6198.331739] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 6198.338855] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000ffffff92 [ 6198.345970] Call trace: [ 6198.348402] ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.354672] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 6198.360421] process_one_work+0x154/0x2a0 [ 6198.364421] worker_thread+0x2a8/0x484 [ 6198.368155] kthread+0xdc/0xe8 [ 6198.371197] ret_from_fork+0x10/0x20 [ 6198.374759] ---[ end trace 0000000000000000 ]--- [ 6218.554886] mt798x-wmac 18000000.wifi: Message 000026ed (seq 15) timeout [ 6218.561656] ------------[ cut here ]------------ [ 6218.566258] WARNING: CPU: 1 PID: 17561 at kthread_park+0x9c/0xb0 [ 6218.572254] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact [ 6218.572430] ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia [ 6218.687403] CPU: 1 PID: 17561 Comm: kworker/u8:2 Tainted: G W O 6.6.29 #0 [ 6218.695385] Hardware name: GL.iNet GL-MT6000 (DT) [ 6218.700071] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 6218.706107] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6218.713047] pc : kthread_park+0x9c/0xb0 [ 6218.716870] lr : mt7915_mcu_add_ba+0x50/0x128 [mt7915e] [ 6218.722087] sp : ffffffc08a3ebb80 [ 6218.725385] x29: ffffffc08a3ebb80 x28: 0000000000000001 x27: 0000000000000000 [ 6218.732501] x26: ffffff8007d060e8 x25: ffffff8004662000 x24: 0000000000000001 [ 6218.739618] x23: 0000000000000000 x22: ffffff800a775da0 x21: ffffff80046da000 [ 6218.746733] x20: ffffff800408e800 x19: ffffff8001188000 x18: 0000000000000000 [ 6218.753848] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 6218.760963] x14: 0000bc823922ce00 x13: 0000bc823922ce00 x12: 0000000000000002 [ 6218.768079] x11: 0000000000000000 x10: 00000000000008a0 x9 : ffffffc08a3ebab0 [ 6218.775195] x8 : ffffff8000b17080 x7 : ffffff8000b16850 x6 : 0000000000000000 [ 6218.782311] x5 : ffffff8007d00d98 x4 : ffffff80046d8a08 x3 : 0000000000001388 [ 6218.789426] x2 : ffffff8000b16780 x1 : ffffff8000b16780 x0 : 0000000000000004 [ 6218.796542] Call trace: [ 6218.798974] kthread_park+0x9c/0xb0 [ 6218.802448] mt7915_mcu_add_ba+0x50/0x128 [mt7915e] [ 6218.807317] mt7915_mcu_add_tx_ba+0x34/0x3c [mt7915e] [ 6218.812359] mt7915_eeprom_get_power_delta+0x1148/0x2348 [mt7915e] [ 6218.818528] drv_ampdu_action+0x6c/0xdc [mac80211] [ 6218.823323] ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 6218.829592] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 6218.835341] process_one_work+0x154/0x2a0 [ 6218.839338] worker_thread+0x2a8/0x484 [ 6218.843072] kthread+0xdc/0xe8 [ 6218.846112] ret_from_fork+0x10/0x20 [ 6218.849675] ---[ end trace 0000000000000000 ]--- ```

___ieee80211 crash:

``` [ 6198.104102] ------------[ cut here ]------------ [ 6198.108704] WARNING: CPU: 1 PID: 17561 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.117520] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact [ 6198.117681] ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia [ 6198.232654] CPU: 1 PID: 17561 Comm: kworker/u8:2 Tainted: G O 6.6.29 #0 [ 6198.240636] Hardware name: GL.iNet GL-MT6000 (DT) [ 6198.245322] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 6198.251341] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6198.258281] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.264896] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 6198.271512] sp : ffffffc08a3ebc80 [ 6198.274812] x29: ffffffc08a3ebc80 x28: 0000000000000001 x27: ffffff800c087480 [ 6198.281928] x26: ffffff8007d003c0 x25: ffffff80046d88a0 x24: ffffff80046d88a0 [ 6198.289044] x23: ffffffc079071d10 x22: ffffff8007d061f0 x21: 0000000000000001 [ 6198.296160] x20: ffffff800c087480 x19: ffffff8007d00000 x18: ffffffffffffc8f8 [ 6198.303276] x17: ffffffffffffc800 x16: 0000000000006838 x15: 00000000000040f8 [ 6198.310392] x14: 0000000000000495 x13: 0000000000000187 x12: 00000000000080f8 [ 6198.317507] x11: 00000000000080f8 x10: 00000000000080f8 x9 : ffffffc080b6a0d0 [ 6198.324624] x8 : 00000000000080f8 x7 : 0000000000000000 x6 : 0000000d5657a665 [ 6198.331739] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000 [ 6198.338855] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000ffffff92 [ 6198.345970] Call trace: [ 6198.348402] ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211] [ 6198.354672] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 6198.360421] process_one_work+0x154/0x2a0 [ 6198.364421] worker_thread+0x2a8/0x484 [ 6198.368155] kthread+0xdc/0xe8 [ 6198.371197] ret_from_fork+0x10/0x20 [ 6198.374759] ---[ end trace 0000000000000000 ]--- ```

followed by WARNING: CPU: 1 PID: 17561 at kthread_park+0x9c/0xb0:

``` [ 6218.554886] mt798x-wmac 18000000.wifi: Message 000026ed (seq 15) timeout [ 6218.561656] ------------[ cut here ]------------ [ 6218.566258] WARNING: CPU: 1 PID: 17561 at kthread_park+0x9c/0xb0 [ 6218.572254] Modules linked in: pppoe ppp_async nft_fib_inet nf_flow_table_inet wireguard pppox ppp_generic nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_compat nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack_netlink nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) libchacha20poly1305 iptable_mangle iptable_filter ipt_REJECT ipt_ECN ip_tables chacha_neon cfg80211(O) xt_time xt_tcpudp xt_tcpmss xt_statistic xt_multiport xt_mark xt_mac xt_limit xt_length xt_hl xt_ecn xt_dscp xt_comment xt_TCPMSS xt_LOG xt_HL xt_DSCP xt_CLASSIFY x_tables slhc sch_cake poly1305_neon nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcurve25519_generic libcrc32c libchacha compat(O) crypto_safexcel sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact [ 6218.572430] ip6_gre ip_gre gre ifb ip6_tunnel tunnel6 ip_tunnel vxlan udp_tunnel ip6_udp_tunnel sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd gpio_button_hotplug(O) usbcore usb_common aquantia [ 6218.687403] CPU: 1 PID: 17561 Comm: kworker/u8:2 Tainted: G W O 6.6.29 #0 [ 6218.695385] Hardware name: GL.iNet GL-MT6000 (DT) [ 6218.700071] Workqueue: phy1 ieee80211_ba_session_work [mac80211] [ 6218.706107] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 6218.713047] pc : kthread_park+0x9c/0xb0 [ 6218.716870] lr : mt7915_mcu_add_ba+0x50/0x128 [mt7915e] [ 6218.722087] sp : ffffffc08a3ebb80 [ 6218.725385] x29: ffffffc08a3ebb80 x28: 0000000000000001 x27: 0000000000000000 [ 6218.732501] x26: ffffff8007d060e8 x25: ffffff8004662000 x24: 0000000000000001 [ 6218.739618] x23: 0000000000000000 x22: ffffff800a775da0 x21: ffffff80046da000 [ 6218.746733] x20: ffffff800408e800 x19: ffffff8001188000 x18: 0000000000000000 [ 6218.753848] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000 [ 6218.760963] x14: 0000bc823922ce00 x13: 0000bc823922ce00 x12: 0000000000000002 [ 6218.768079] x11: 0000000000000000 x10: 00000000000008a0 x9 : ffffffc08a3ebab0 [ 6218.775195] x8 : ffffff8000b17080 x7 : ffffff8000b16850 x6 : 0000000000000000 [ 6218.782311] x5 : ffffff8007d00d98 x4 : ffffff80046d8a08 x3 : 0000000000001388 [ 6218.789426] x2 : ffffff8000b16780 x1 : ffffff8000b16780 x0 : 0000000000000004 [ 6218.796542] Call trace: [ 6218.798974] kthread_park+0x9c/0xb0 [ 6218.802448] mt7915_mcu_add_ba+0x50/0x128 [mt7915e] [ 6218.807317] mt7915_mcu_add_tx_ba+0x34/0x3c [mt7915e] [ 6218.812359] mt7915_eeprom_get_power_delta+0x1148/0x2348 [mt7915e] [ 6218.818528] drv_ampdu_action+0x6c/0xdc [mac80211] [ 6218.823323] ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211] [ 6218.829592] ieee80211_ba_session_work+0x418/0x444 [mac80211] [ 6218.835341] process_one_work+0x154/0x2a0 [ 6218.839338] worker_thread+0x2a8/0x484 [ 6218.843072] kthread+0xdc/0xe8 [ 6218.846112] ret_from_fork+0x10/0x20 [ 6218.849675] ---[ end trace 0000000000000000 ]--- ```

after this my full wired segment also did not work atleast luci did not wanted to load anymore, but I was still able to ssh when I used reboot it rendered my network inaccessible I had to re-power in order to have it working again, I think the IEE crash might be a interesting one?

though it took a pretty long time these crashes happened it went right after this order when I was playing gta online on my Ayaneo Geek 1S (Intel AX210) which uses heavily udp streams, already in game for +2 hours.

lukasz1992 commented 5 months ago

Sorry, I have no other ideas than checking older versions (like 23.05 or mediatek-oem)

xize commented 5 months ago

hi @Fail-Safe,

I got some news πŸ‘, a few days ago I found a option inside my windows settings for the AX210 driver called 'Transmit Power' now I don't know exactly what this is since I only know routers to have this, maybe it is some type of flag to advertise something different to the AP to get more priority over other devices?

as default this was set on highest, i've turned it into lowest.

as result the crashing stopped from appearing, I seem to get it stable for 3 days now, I also was crashing with igmpproxy and avahi off, but im not entirely sure allmulticast was off too on my multi psk phys, I readed that it leaved so I assume it was.

this is the screenshot of the settings:

![ayaneo_crash](https://github.com/openwrt/mt76/assets/4119877/262a9012-1401-4ee5-963d-4b8ca56e02db)

I find it interesting that this option altered the behaviour of crashing, sometimes I still seem to disconnect but all other wireless devices keep connected.

I'm not sure if this commit 513c131c6309712a51502870b041f45b4bd6a6d4, 14d5ee9f336923cf693ebf56d75bee41782f8112 also fixes it, I have been testing this before these 2 commits.

zekica commented 5 months ago

@xize I was experiencing something similiar with multi psk, it seems the allmulticast mode was also active.

@rany2

I'm also experiencing crashes with multi-psk on MT7981.

When using OpenWrt snapshot without any patches to the mt76 driver, the chip completely restarts on it's own and the wifi network appears in a couple of seconds. All clients including ones connected via the main PSK get disconnected.

Then I tried rany2/openwrt@18cc739 patch and 0x5a messages stop appearing but the chip still hangs, the driver shows 0x26 timeout and restarts.

I then tried to compile the rany2/openwrt fork and since it applies a bunch of patches, when the chip hangs, it manages to recover without disconnecting clients, but shows the following:

[  447.275349] mt798x-wmac 18000000.wifi: send message 000130ed timeout, try again(1).
[  447.283349] mt798x-wmac 18000000.wifi: 
[  447.283349] phy0 L1 SER recovery completed.
[  447.821897] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000004
[  447.828811] mt798x-wmac 18000000.wifi: 
[  447.828811] phy0 L1 SER recovery start.
[  447.837695] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000008
[  447.854270] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000010
[  447.861219] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000020
[  447.868360] mt798x-wmac 18000000.wifi: 
[  447.868360] phy0 L1 SER recovery completed.

I'm assuming that 0x00130ed is message type 0x30 MCU_EXT_CMD_GET_TX_STAT.

The same setup works on MT7613, MT7612, MT7615, MT7603, client optimized MT7921k (n, ac, ax), and appears to not hang even on MT7975 (Asus RT-AX53U) even though it uses the same mt7915e module.

So I'm assuming that this is a firmware bug, so I tried all five firmware versions published on mtk-feeds, and it's similar with all, but the crashes don't happen as often with the latest firmware.

If possible, can someone explain to me what's the difference between stations connected to the main AP interface vs ones connected to AP_VLAN interface? The keys are different, but why would it cause it to crash?

xize commented 5 months ago

I wonder if it might be some type of overheating issue, like the chip first starts timing out as a form to throthling TX(?) and when it gets pushed even more it crashes.

then my question comes: is multicast one factor which pushes the chip?

I noticed when I changed the txpower entry on my windows device (AX210) the only device in my network which was subject to the crashing (yes I have 10+ devices), the crashing and seq messages stopped.

I have been continously playing for 3 days longer than 3 hours and I have not seen it re-appear.

but I have no clue how I need to check this nor confirm, it would be nice if someone can tell me some commands which I can output because this is a interesting heuristic. πŸ‘

zekica commented 5 months ago

I wonder if it might be some type of overheating issue, like the chip first starts timing out as a form to throthling TX(?) and when it gets pushed even more it crashes.

IMO, this shouldn't be the case as transferring 100+GB with non-wds station doesn't cause any crashes with me, and there shouldn't be any reason for thermal issues to have anything to do with vlan stations.

then my question comes: is multicast one factor which pushes the chip?

AFAIK, multicast and broadcast is handled completely differently - these packets are sent using the lowest allowed rate so every station can receive them, changing them to unicast should improve compatibility with buggy firmware, but it doesn't for some reason.

I noticed when I changed the txpower entry on my windows device (AX210) the only device in my network which was subject to the crashing (yes I have 10+ devices), the crashing and seq messages stopped.

TX power didn't have anything to do in my case.. setting tx power to 1dBm or 22dBm didn't change anything.

lukasz1992 commented 4 months ago

@Fail-Safe @zekica https://github.com/blocktrron/mt76/commit/7447213e9e655f7bab6f45e54053747c2f1104e4 what about applying this patch?

Fail-Safe commented 4 months ago

@lukasz1992 Thanks for making us aware of that patch! I did apply it and re-enabled multicast_to_unicast_all. I'm not seeing a full-on crash as of yet, but I'm seeing timeout messages showing up:

[  318.691399] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 3) timeout
[ 1040.369616] mt798x-wmac 18000000.wifi: Message 000026ed (seq 6) timeout
[ 1189.706201] mt798x-wmac 18000000.wifi: Message 000026ed (seq 7) timeout
[ 1210.164932] mt798x-wmac 18000000.wifi: Message 00002ced (seq 8) timeout
[ 1230.622232] mt798x-wmac 18000000.wifi: Message 00005aed (seq 9) timeout
[ 1251.080719] mt798x-wmac 18000000.wifi: Message 000026ed (seq 10) timeout
[ 1271.548743] mt798x-wmac 18000000.wifi: Message 00002ced (seq 11) timeout
[ 1291.996372] mt798x-wmac 18000000.wifi: Message 00005aed (seq 12) timeout
[ 1312.455490] mt798x-wmac 18000000.wifi: Message 000026ed (seq 13) timeout
[ 1332.912846] mt798x-wmac 18000000.wifi: Message 00002ced (seq 14) timeout
[ 1353.370877] mt798x-wmac 18000000.wifi: Message 00005aed (seq 15) timeout
[ 1373.830077] mt798x-wmac 18000000.wifi: Message 000026ed (seq 1) timeout
[ 1394.286904] mt798x-wmac 18000000.wifi: Message 000026ed (seq 10) timeout
[ 1414.755320] mt798x-wmac 18000000.wifi: Message 000800c4 (seq 11) timeout
[ 1435.203704] mt798x-wmac 18000000.wifi: Message 00002ced (seq 12) timeout
[ 1455.661331] mt798x-wmac 18000000.wifi: Message 00005aed (seq 13) timeout
[ 1476.120504] mt798x-wmac 18000000.wifi: Message 000026ed (seq 14) timeout
[ 1563.182051] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
[ 1865.864221] mt798x-wmac 18000000.wifi: Message 00005aed (seq 15) timeout
Fail-Safe commented 4 months ago

Oooof. Here we go ☹️

...
[ 2897.289155] mt798x-wmac 18000000.wifi: Message 00005aed (seq 12) timeout
[ 2899.943303] mt798x-wmac 18000000.wifi: Message 00005aed (seq 15) timeout
[ 3154.323249] mt798x-wmac 18000000.wifi: Message 000026ed (seq 1) timeout
[ 4056.053187] mt798x-wmac 18000000.wifi: Message 00005aed (seq 9) timeout
[ 4622.974265] mt798x-wmac 18000000.wifi: Message 00005aed (seq 7) timeout
[ 4836.364011] ------------[ cut here ]------------
[ 4836.368632] WARNING: CPU: 2 PID: 8668 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 4836.377372] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
[ 4836.377546]  gpio_button_hotplug(O) usbcore usb_common aquantia
[ 4836.472635] CPU: 2 PID: 8668 Comm: kworker/u8:4 Tainted: G           O       6.6.30 #0
[ 4836.480531] Hardware name: GL.iNet GL-MT6000 (DT)
[ 4836.485217] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[ 4836.491257] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 4836.498198] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 4836.504813] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211]
[ 4836.511428] sp : ffffffc081273c80
[ 4836.514725] x29: ffffffc081273c80 x28: 0000000000000001 x27: ffffff800a623f00
[ 4836.521842] x26: ffffff800b6dc3b8 x25: ffffff80091e08a0 x24: ffffff80091e08a0
[ 4836.528959] x23: ffffffc078fbecc8 x22: ffffff80060080e8 x21: 0000000000000001
[ 4836.536074] x20: ffffff800a623f00 x19: ffffff800b6dc000 x18: 0000000000000000
[ 4836.543190] x17: 0000000000000100 x16: 001c000800000000 x15: 000102050028000d
[ 4836.550306] x14: 0000000000000000 x13: 0000000000000028 x12: 0000000000000002
[ 4836.557422] x11: 0000000000000040 x10: ffffffc080b67470 x9 : ffffffc080b67468
[ 4836.564537] x8 : ffffff8000401020 x7 : 0000000000000000 x6 : 0000000d573ff195
[ 4836.571652] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 4836.578767] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000fffffff4
[ 4836.585883] Call trace:
[ 4836.588314]  ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 4836.594583]  ieee80211_ba_session_work+0x418/0x444 [mac80211]
[ 4836.600333]  process_one_work+0x154/0x2a0
[ 4836.604333]  worker_thread+0x2ac/0x48c
[ 4836.608067]  kthread+0xdc/0xe8
[ 4836.611110]  ret_from_fork+0x10/0x20
[ 4836.614678] ---[ end trace 0000000000000000 ]---
[ 5044.308637] mt798x-wmac 18000000.wifi: Message 00002ced (seq 10) timeout
[ 5048.806140] mt798x-wmac 18000000.wifi: Message 00005aed (seq 7) timeout
[ 5201.019771] mt798x-wmac 18000000.wifi: Message 00005aed (seq 12) timeout
[ 5217.597460] mt798x-wmac 18000000.wifi: Message 00005aed (seq 14) timeout
[ 5320.095313] mt798x-wmac 18000000.wifi: Message 00005aed (seq 13) timeout
[ 5433.228725] mt798x-wmac 18000000.wifi: Message 000025ed (seq 4) timeout
[ 5437.221749] mt798x-wmac 18000000.wifi: Message 00005aed (seq 2) timeout
[ 5624.001329] ------------[ cut here ]------------
[ 5624.005946] WARNING: CPU: 0 PID: 9449 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 5624.014675] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
[ 5624.014840]  gpio_button_hotplug(O) usbcore usb_common aquantia
[ 5624.109924] CPU: 0 PID: 9449 Comm: kworker/u8:6 Tainted: G        W  O       6.6.30 #0
[ 5624.117817] Hardware name: GL.iNet GL-MT6000 (DT)
[ 5624.122504] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[ 5624.128517] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 5624.135458] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 5624.142073] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211]
[ 5624.148688] sp : ffffffc0811dbc80
[ 5624.151986] x29: ffffffc0811dbc80 x28: 0000000000000001 x27: ffffff8009901cc0
[ 5624.159102] x26: ffffff80009803b8 x25: ffffff80091e08a0 x24: ffffff80091e08a0
[ 5624.166218] x23: ffffffc078fbecc8 x22: ffffff80009840e8 x21: 0000000000000001
[ 5624.173335] x20: ffffff8009901cc0 x19: ffffff8000980000 x18: 0000000000000070
[ 5624.180450] x17: ffffffbfbf247000 x16: ffffffc080000000 x15: 00005aa8af65510d
[ 5624.187565] x14: 00005aa8af65510d x13: 0000000000000001 x12: 0000000000000002
[ 5624.194681] x11: 0000000000000040 x10: ffffffc080b67470 x9 : ffffffc080b67468
[ 5624.201796] x8 : ffffff8000401020 x7 : 0000000000000000 x6 : 0000000d573ff195
[ 5624.208911] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 5624.216027] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000fffffff4
[ 5624.223142] Call trace:
[ 5624.225575]  ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 5624.231843]  ieee80211_ba_session_work+0x418/0x444 [mac80211]
[ 5624.237592]  process_one_work+0x154/0x2a0
[ 5624.241590]  worker_thread+0x2ac/0x48c
[ 5624.245325]  kthread+0xdc/0xe8
[ 5624.248367]  ret_from_fork+0x10/0x20
[ 5624.251928] ---[ end trace 0000000000000000 ]---
[ 5680.260541] phy1-ap1: HW problem - can not stop rx aggregation for 20:69:80:xx:xx:xx tid 6
[ 5985.397401] mt798x-wmac 18000000.wifi: Message 00005aed (seq 8) timeout
[ 6852.354358] mt798x-wmac 18000000.wifi: Message 00005aed (seq 8) timeout
[ 7939.418743] mt798x-wmac 18000000.wifi: Message 00005aed (seq 10) timeout
[ 8721.653210] ------------[ cut here ]------------
[ 8721.657829] WARNING: CPU: 1 PID: 10406 at ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 8721.666644] Modules linked in: nft_fib_inet nf_flow_table_inet iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE nft_reject_ipv6 nft_reject_ipv4 nft_reject_inet nft_reject nft_redir nft_quota nft_numgen nft_nat nft_masq nft_log nft_limit nft_hash nft_flow_offload nft_fib_ipv6 nft_fib_ipv4 nft_fib nft_ct nft_chain_nat nf_tables nf_nat nf_flow_table nf_conntrack mt7915e(O) mt76_connac_lib(O) mt76(O) mac80211(O) iptable_mangle iptable_filter ipt_REJECT ip_tables cfg80211(O) xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG x_tables tcp_bbr nfnetlink nf_reject_ipv6 nf_reject_ipv4 nf_log_syslog nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c compat(O) cls_flower act_vlan crypto_safexcel cls_bpf act_bpf sch_tbf sch_ingress sch_htb sch_hfsc em_u32 cls_u32 cls_route cls_matchall cls_fw cls_flow cls_basic act_skbedit act_mirred act_gact sha512_arm64 sha1_ce sha1_generic seqiv md5 geniv des_generic libdes authencesn authenc leds_gpio xhci_plat_hcd xhci_pci xhci_mtk_hcd xhci_hcd
[ 8721.666817]  gpio_button_hotplug(O) usbcore usb_common aquantia
[ 8721.761905] CPU: 1 PID: 10406 Comm: kworker/u8:4 Tainted: G        W  O       6.6.30 #0
[ 8721.769888] Hardware name: GL.iNet GL-MT6000 (DT)
[ 8721.774575] Workqueue: phy1 ieee80211_ba_session_work [mac80211]
[ 8721.780620] pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 8721.787561] pc : ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 8721.794177] lr : ___ieee80211_stop_tx_ba_session+0x1d4/0x2f4 [mac80211]
[ 8721.800791] sp : ffffffc0811f3c80
[ 8721.804089] x29: ffffffc0811f3c80 x28: 0000000000000001 x27: ffffff8010154c00
[ 8721.811206] x26: ffffff80011743b8 x25: ffffff80091e08a0 x24: ffffff80091e08a0
[ 8721.818322] x23: ffffffc078fbecc8 x22: ffffff80011700e8 x21: 0000000000000001
[ 8721.825438] x20: ffffff8010154c00 x19: ffffff8001174000 x18: ffffffffffffc8f8
[ 8721.832553] x17: ffffffffffffc800 x16: 0000000000006838 x15: 00000000000040f8
[ 8721.839669] x14: 0000000100010400 x13: 0000000000000000 x12: 0000000000000002
[ 8721.846784] x11: 0000000000000040 x10: ffffffc080b67470 x9 : ffffffc080b67468
[ 8721.853900] x8 : ffffff8000401020 x7 : 0000000000000000 x6 : 0000000d573ff195
[ 8721.861015] x5 : 0000000001000000 x4 : 0000000000000000 x3 : 0000000000000000
[ 8721.868131] x2 : 0000000000000001 x1 : 0000000000000002 x0 : 00000000fffffff4
[ 8721.875247] Call trace:
[ 8721.877680]  ___ieee80211_stop_tx_ba_session+0x2b4/0x2f4 [mac80211]
[ 8721.883949]  ieee80211_ba_session_work+0x418/0x444 [mac80211]
[ 8721.889698]  process_one_work+0x154/0x2a0
[ 8721.893697]  worker_thread+0x2ac/0x48c
[ 8721.897431]  kthread+0xdc/0xe8
[ 8721.900474]  ret_from_fork+0x10/0x20
[ 8721.904036] ---[ end trace 0000000000000000 ]---
Headcrabed commented 4 months ago

Maybe this issue is a duplicate of https://github.com/openwrt/mt76/issues/690 ?

zekica commented 4 months ago

Maybe related, but I wouldn't say it's duplicate: the message shown is the same, but the underlying cause is probably not the same. #690 happens on MT7915 (MT7905+MT7975) PCIe device while this issue is for MT7981/MT7986 SoC.

The firmware running on the MCU is not the same. It's just that messages 0x26 and 0x5a are most often being exchanged with both.

Fail-Safe commented 4 months ago
[   54.874089] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000004
[   54.880981] mt798x-wmac 18000000.wifi:
[   54.880981] phy0 L1 SER recovery start.
[   54.888622] mt798x-wmac 18000000.wifi: Message 000025ed (seq 11) timeout
[   54.889422] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000008
[   55.131284] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000010
[   55.138214] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000020
[   55.395327] mt798x-wmac 18000000.wifi: send message 000025ed timeout, try again(1).
[   55.404744] mt798x-wmac 18000000.wifi:
[   55.404744] phy0 L1 SER recovery completed.
evelyn3648 commented 4 months ago

If your scenario is about IGMP/MLD multicast, you shall use the folllowing ways Check IGMP snooping status by: cat /sys/class/net/br-lan/bridge/multicast_snooping Check IGMP snooping multicast-to-unicast by: cat /sys/class/net/br-lan/brif/phyx-apx/multicast_to_unicast

Linux Upstream Commit: https://github.com/torvalds/linux/commit/6db6f0eae6052b70885562e1733896647ec1d807

For MAC80211 Multicast-to-Unicast feature, we haven't test it and if it leads the firmware hang or system error recovery log show up, it might due to the unexpected 802.11 unicast frame after converting.

zekica commented 4 months ago

@evelyn3648 hi, can you also take a look at the ap_vlan issue #881. I have narrowed it down to the firmware crashing when sending software GTK encrypted pakets (multicast/broadcast packets sent to ap_vlan interface) while the receive queue is full.

Fail-Safe commented 4 months ago
[ 2490.845304] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000004
[ 2490.852186] mt798x-wmac 18000000.wifi:
[ 2490.852186] phy0 L1 SER recovery start.
[ 2490.852186] mt798x-wmac 18000000.wifi: Message 0000aded (seq 14) timeout
[ 2490.852192] mt798x-wmac 18000000.wifi: send message 0000aded timeout, try again(1).
[ 2490.860618] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000008
[ 2490.866527] mt798x-wmac 18000000.wifi: Message 0000aded (seq 15) timeout
[ 2490.882935] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000010
[ 2490.894714] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000020
[ 2491.325319] mt798x-wmac 18000000.wifi: send message 0000aded timeout, try again(2).
[ 2491.334460] mt798x-wmac 18000000.wifi:
[ 2491.334460] phy0 L1 SER recovery completed.
[73644.244424] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000004
[73644.251314] mt798x-wmac 18000000.wifi:
[73644.251314] phy0 L1 SER recovery start.
[73644.258995] mt798x-wmac 18000000.wifi: Message 00005aed (seq 3) timeout
[73644.259748] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000008
[73644.281249] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000010
[73644.288185] mt798x-wmac 18000000.wifi: phy0 SER recovery state: 0x00000020
[73644.761290] mt798x-wmac 18000000.wifi: send message 00005aed timeout, try again(1).
[73644.771618] mt798x-wmac 18000000.wifi:
[73644.771618] phy0 L1 SER recovery completed.
zekica commented 4 months ago

@Fail-Safe @Headcrabed @lukasz1992 I have an interesting observation to share regarding this:

The crash is caused with unicast AP to Station (TX) packets with IP packet length of 482 or less. Packets with 483 or more bytes never cause a crash.

This only happens for packets sent via ieee80211_subif_start_xmit or ieee80211_convert_to_unicast. Packets for stations in the main AP are sent via ieee80211_8023_xmit, not sure why.

Disabling multicast-to-unicast disables this path in mac80211 and works around this problem, but is the only path available for ap_vlan - my issue #881

zekica commented 4 months ago

I found a workaround for my issue and wrote on #881

The issue there surfacing this underlying issue is that mac80211 doesn't replace default ieee80211_dataif_ops with offloaded ieee80211_dataif_8023_ops on ap_vlan interfaces.

The issue here is the underlying one and still has to be investigated.

A workaround here may be to force those converted unicast packets to be transmitted via ieee80211_8023_xmit somehow.

lukasz1992 commented 3 months ago

Does the crash also happens on my version with some patches: https://github.com/lukasz1992/openwrt/tree/v23.05.3-lukasz1992 ?

LuisMitaHL commented 1 month ago

Mediatek just dropped new firmware: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/0fdbc0e6d84bbc0216da2842a494bdf01f745c6c

The release notes claims "Fix MAC80211 multicast-to-unicast issue"

Headcrabed commented 1 month ago

Glad to see that newest firmware already added to openwrtβ€˜s mt76 repo.

Mediatek just dropped new firmware: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/0fdbc0e6d84bbc0216da2842a494bdf01f745c6c

The release notes claims "Fix MAC80211 multicast-to-unicast issue"

Fail-Safe commented 1 month ago

I'm testing the new firmware for a few hours now and no crashes! This is looking good!

More verbose updates here: https://forum.openwrt.org/t/mt6000-custom-build-with-luci-and-some-optimization-kernel-6-6-x/185241/947?u=_failsafe

Fail-Safe commented 1 month ago

Over 1 day and 4 hours of uptime with no crashes to be seen. (!!!) I'd say we can finally put this issue to bed with the fix being the updated firmware as released here: https://git01.mediatek.com/plugins/gitiles/openwrt/feeds/mtk-openwrt-feeds/+/0fdbc0e6d84bbc0216da2842a494bdf01f745c6c

Thanks to the Mediatek devs who figured this one out! πŸ»πŸŽ‰