Closed ivanich closed 3 years ago
Does this only happen on 5G phy?
Does this only happen on 5G phy?
Yes, I tried to do the same(speedtest from phone) with 2.4 and it didn't crash
Just for reference my config, had to run wifi
command in the rc.local in order to have working both radios after reboot
config wifi-device 'radio0'
option type 'mac80211'
option hwmode '11g'
option htmode 'HT40'
option noscan '1'
option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0'
option legacy_rates '0'
option channel '6'
option country 'US'
config wifi-iface 'default_radio0'
option device 'radio0'
option network 'lan'
option mode 'ap'
option ssid 'wifi2'
option encryption 'psk2+ccmp'
option key 'mypassword'
config wifi-device 'radio1'
option type 'mac80211'
option channel '149'
option txpower '23'
option hwmode '11a'
option path '1e140000.pcie/pci0000:00/0000:00:00.0/0000:01:00.0+1'
option htmode 'VHT80'
option country 'US'
option noscan '1'
config wifi-iface 'default_radio1'
option device 'radio1'
option network 'lan'
option mode 'ap'
option ssid 'wifi5'
option encryption 'psk2+ccmp'
option key 'mypassword'
option ieee80211r '1'
option mobility_domain '4f11'
option reassociation_deadline '20000'
option ft_over_ds '1'
option ft_psk_generate_local '1'
option pmk_r1_push '1'
Can you try to revert this? https://github.com/openwrt/mt76/commit/bfa8d5a6a9a1cd43d4ce4090f53fdc52c6247d66
If you tell me how to do it. I mean mt76 is a package with Makefile and not a repo which i can easily revert commits
Reverted https://github.com/openwrt/mt76/commit/bfa8d5a6a9a1cd43d4ce4090f53fdc52c6247d66 and unfortunatelly same crash.
Hold on, it makes no sense, Makefile rewriting mt76-2021-03-15-ca22cc22 directory content EDIT: Was able to build with reverted commit by forking mt76 repo, changed url and last commit hash in the makefile, but router still crashes
<1>[ 147.273909] CPU 1 Unable to handle kernel paging request at virtual address 00000000, epc == 8e4f288c, ra == 8e58a968
<4>[ 147.284570] Oops[#1]:
<4>[ 147.286854] CPU: 1 PID: 3724 Comm: kworker/u9:3 Not tainted 5.4.105 #0
<4>[ 147.293403] Workqueue: napi_workq napi_workfn
<4>[ 147.297742] $ 0 : 00000000 00000001 8e595ec0 00000000
<4>[ 147.302950] $ 4 : 8e595ec0 00000000 8e5ca600 00000000
<4>[ 147.308161] $ 8 : 00000000 000004c1 80808080 fefefeff
<4>[ 147.313370] $12 : 00000000 00000000 80704f1c 00000040
<4>[ 147.323781] $20 : 00000050 00000044 00000011 01ce9502
<4>[ 147.334195] $28 : 8e004000 8e005d18 8ddf2020 8e58a968
<4>[ 147.342266] Lo : 00000000
<4>[ 147.350397] ra : 8e58a968 mt7615_queue_rx_skb+0x94c/0xd0c [mt7615_common]
<4>[ 147.361580] Cause : 4080000c (ExcCode 03)
<4>[ 147.368434] PrId : 0001992f (MIPS 1004Kc)
<4>[ 147.456965] Process kworker/u9:3 (pid: 3724, threadinfo=9ece68d6, task=b16d155f, tls=00000000)
<4>[ 147.473857] 0001ce95 00004188 00000000 00000080 00004c10 00000002 8074a2b8 80043dcc
<4>[ 147.490505] 00000000 8e4f0110 8069ddf4 806ab148 8dfc0000 00001000 00000000 00000050
<4>[ 147.507150] ...
<4>[ 147.512033] [<8e4f288c>] mt76_rx+0x124/0x320 [mt76]
<4>[ 147.526633]
<4>[ 147.528512] ---[ end trace 3d137e75bb109149 ]---
This should be a problem with DBDC. MT7615+MT7615 does not have this problem.
@ivanich My bad. Can you try this out ? https://github.com/ryderlee1110/mt76/commit/49cc3d6fc79587cb5e90f8e22a3e1d4e5486ffb8
My bad. Can you try this out ? ryderlee1110@49cc3d6
7621+7615.During the iperf3 test, the RX and TX speed of the client can hit 630M with udp . But the client download is only 550M and the upload is still 630M with tcp. Is it because the upload has some offload like rx decap offload while router tx doesn't have ?
My bad. Can you try this out ? ryderlee1110@49cc3d6
7621+7615.During the iperf3 test, the RX and TX speed of the client can hit 630M with udp . But the client download is only 550M and the upload is still 630M with tcp. Is it because the upload has some offload like rx decap offload while router tx doesn't have ?
Not sure. How much you can get without Rx decap series? HW can support Tx enacp but I don't have time to add that.
My bad. Can you try this out ? ryderlee1110@49cc3d6
7621+7615.During the iperf3 test, the RX and TX speed of the client can hit 630M with udp . But the client download is only 550M and the upload is still 630M with tcp. Is it because the upload has some offload like rx decap offload while router tx doesn't have ?
Not sure. How much you can get without Rx decap series? HW can support Tx enacp but I don't have time to add that.
I dropped this https://github.com/MeIsReallyBa/mt76/commit/86ce24f182ec7ea2ee70648c0cf5e3478b89d173 and it seems that rx decap didn't make a big diffrence as the upload speed can still hit 800+Mb/S(TCP) in 160MHZ. But the client download speed is even slower in 160mhz than 80mhz and the wireless info shows that tx runs in 80MHZ while rx runs in 160MHZ
I don't think rc_minstrel can support 160Mhz, and it's better to keep your test w/ w/o entire rx series in BW80.
I don't think rc_minstrel can support 160Mhz, and it's better to keep your test w/ w/o entire rx series in BW80.
No diffrence .
@MeIsReallyBa there should be 3 patches. did you drop rx-amsdu as well?
@MeIsReallyBa there should be 3 patches. did you drop rx-amsdu as well?
revert 3 patches like this https://github.com/MeIsReallyBa/mt76/commits/master Now udp upload is still 600+ and tcp upload decreased to 450Mb.
@MeIsReallyBa there should be 3 patches. did you drop rx-amsdu as well?
revert 3 patches like this https://github.com/MeIsReallyBa/mt76/commits/master Now udp upload is still 600+ and tcp upload decreased to 450Mb.
It looks like the series improves Rx performance. BTW. this is off-topic .. :)
@ivanich My bad. Can you try this out ? ryderlee1110@49cc3d6
Tried, the same crash, or should I try it in combination of some reverts?
@ivanich can you do git bisect to find out the offending commit?
@ivanich can you do git bisect to find out the offending commit?
sorry i can't spend whole day doing this
I checked out to commit https://github.com/ivanich/mt76/commit/760adce291007b44322d233caf9fc1dffce8015d and it fails, looks like one of the these commits causing it
mt76: mt7615: fix memory leak in mt7615_coredump_work
mt76: mt7615: add support for rx decapsulation offload
mt76: mt7615: add rx checksum offload support
mt76: mt7615: enable hw rx-amsdu de-aggregation
Also going to check this one https://github.com/ivanich/mt76/commit/1706fb6c48e8cea6feb9ab5c6d60454aeb5db267
I checked out to commit ivanich@760adce and it fails, looks like one of the these commits causing it
mt76: mt7615: add support for rx decapsulation offload mt76: mt7615: add rx checksum offload support mt76: mt7615: enable hw rx-amsdu de-aggregation
Does dbdc work after reverting these commits?
@ryderlee1110 It doesn't crash with those 3 commits reverted, as far as I can say dbdc works because I can see both radios up and speed is more or less fine, getting ~500Mbits with 5Ghz with speedtest(don't want to bother with iperf3 right now)
@ivanich Thanks. Can you try to just keep mt76: mt7615: enable hw rx-amsdu de-aggregation?
@ryderlee1110
Just tried with mt76: mt7615: enable hw rx-amsdu de-aggregation
- crash
@ivanich So, the latest mt76 works w/o this commit, but I don't see how this affects second phy at the moment. Does this crash happen on sta association or speedtest?
@ryderlee1110 speedtest and only on upload test, download passes fine
@ivanich have to add some logs in mt76_rx(), but unfortunately I don't have dbdc to debug :(
Drop me a commit and I'll test it
As it looks it crashes in both directions. Just in download (AP->Client) takes more time. Tested with iperf3.
As it looks it crashes in both directions. Just in download (AP->Client) takes more time. Tested with iperf3.
Does Tx work fine without that commit in you case? Can you test udp tx?
@ryderlee1110
I can confirm using ca22cc221ae7c87535d6f24b12c0280f6b560b31 as a base and manually revert only changes from c5aca6692c41546084e3f1c920fdde6d5f9b0f36 (mt76: mt7615: enable hw rx-amsdu de-aggregation) it does not crash anymore.
This is test with iperf3 (TCP). Will do UDP tests asap.
@ryderlee1110 Managed to make few tests. Here are the results:
mt76: ca22cc221ae7c87535d6f24b12c0280f6b560b31 TCP client->AP: CRASH AP->client: CRASH UDP client->AP: CRASH AP->client: no crash
mt76: ca22cc221ae7c87535d6f24b12c0280f6b560b31 with reverted commit c5aca6692c41546084e3f1c920fdde6d5f9b0f36 TCP client->AP: OK AP->client: OK UDP client->AP: OK AP->client: OK
@ryderlee1110, it is failing at *dev->rx_amsdu[q].tail = skb, function mt76_rx_release_burst.
dev->rx_amsdu[q].tail is NULL.
@porentak can you add log to check if it goes into if (status->first_amsdu) in mt76_rx_release_burst(), and goes into if (status->amsdu). Dump first phy as well.
@ryderlee1110 I hope this is what you asked for. I took a little longer with this patch to crash. But it crashed at the end.
+++ b/mac80211.c
if (status->first_amsdu) {
+ printk("l2\n");
dev->rx_amsdu[q].tail = &skb_shinfo(skb)->frag_list;
+ if (dev->rx_amsdu[q].tail == NULL)
+ printk("l3\n");
`
/ trailing amsdu subframes /
+ if (dev->rx_amsdu[q].tail == NULL)
+ printk("l4\n");
+
*dev->rx_amsdu[q].tail = skb;
diff --git a/mt7615/mac.c b/mt7615/mac.c
index 59aa8f8..ce494aa 100644
+++ b/mt7615/mac.c
if (status->amsdu) {
+ printk("l1: 0x%x\n", phy);`
Logs are:
[ 94.704932] l1: 0x8e6c60b0
[ 94.710340] l2
[ 94.713667] l1: 0x8e6c60b0
[ 94.722628] l1: 0x8e6c60b0
[ 94.728037] l2
[ 94.731365] l1: 0x8e6c60b0
[ 94.743749] l1: 0x8e6c60b0
[ 94.749148] l2
....
[ 105.095552] l2
[ 105.098853] l1: 0x8e6c60b0
[ 105.104265] l1: 0x8e6c60b0
[ 105.109665] l2
[ 105.112995] l1: 0x8e6c60b0
[ 105.118399] l1: 0x8e6c60b0
[ 105.123799] l2
[ 105.128422] l1: 0x8e6c60b0
[ 105.133863] l4
@porentak Thanks. can you modify patch a bit? you can remove printk("l3\n");
if (status->amsdu) {
printk("l1: amsdu_info:%d\n", amsdu_info);
......
if (dev->rx_amsdu[q].tail == NULL)
printk("l4: q%d, is_amsdu:%d, out-of-order%d\n", q, status->amsdu, status->seqno != dev->rx_amsdu[q].seqno);
Suspect it's out of order here.
diff --git a/mt7615/mac.c b/mt7615/mac.c
index 59aa8f8..561d573 100644
--- a/mt7615/mac.c
+++ b/mt7615/mac.c
@@ -469,6 +469,7 @@ static int mt7615_mac_fill_rx(struct mt7615_dev *dev, struct sk_buff *skb)
status->first_amsdu = amsdu_info == MT_RXD1_FIRST_AMSDU_FRAME;
status->last_amsdu = amsdu_info == MT_RXD1_LAST_AMSDU_FRAME;
if (!hdr_trans) {
+ BUG_ON(skb_tail_pointer(skb) - skb->data < ieee80211_get_hdrlen_from_skb(skb));
memmove(skb->data + 2, skb->data,
ieee80211_get_hdrlen_from_skb(skb));
skb_pull(skb, 2);
@porentak could you try this patch?
very curious
if it is intend to remove the last 2 bytes in skb, why not just do it simple skb->len -= 2;
there is no need to memove and pull
@ivanich @porentak could you try this fixup out https://pastebin.com/GdCCiGtU ?
btw are you guys running both phy at the same time?
@porentak Thanks. can you modify patch a bit? you can remove printk("l3\n");
if (status->amsdu) { printk("l1: amsdu_info:%d\n", amsdu_info); ...... if (dev->rx_amsdu[q].tail == NULL) printk("l4: q%d, is_amsdu:%d, out-of-order%d\n", q, status->amsdu, status->seqno != dev->rx_amsdu[q].seqno);
Suspect it's out of order here.
[ 119.799863] l1: amsdu_info:3
[ 119.805630] l2
[ 119.808984] l1: amsdu_info:1
[ 119.816920] l1: amsdu_info:3
[ 119.822659] l2
[ 119.825995] l1: amsdu_info:1
[ 119.838827] l1: amsdu_info:3
[ 119.844570] l2
....
[ 175.855029] l1: amsdu_info:1
[ 175.860781] l1: amsdu_info:3
[ 175.866515] l2
[ 175.869835] l1: amsdu_info:1
[ 175.875575] l1: amsdu_info:3
[ 175.881313] l2
[ 175.884654] l1: amsdu_info:1
[ 175.891744] l1: amsdu_info:3
[ 175.897526] l2
[ 175.900862] l1: amsdu_info:1
[ 175.906602] l1: amsdu_info:3
[ 175.912335] l2
[ 175.915655] l1: amsdu_info:1
[ 175.921400] l1: amsdu_info:3
[ 175.927132] l2
[ 175.930458] l1: amsdu_info:1
[ 175.936195] l4: q0, is_amsdu:1, out-of-order0
@ivanich @porentak could you try this fixup out https://pastebin.com/GdCCiGtU ?
@ryderlee1110 with this patch I cannot reproduce crash.
On the other hand, I have a feeling throughput is lower than without (mt76: mt7615: enable hw rx-amsdu de-aggregation). But that is just a feeling at the moment. Need to make more tests.
@ivanich @porentak could you try this fixup out https://pastebin.com/GdCCiGtU ?
@ryderlee1110 with this patch I cannot reproduce crash.
On the other hand, I have a feeling throughput is lower than without (mt76: mt7615: enable hw rx-amsdu de-aggregation). But that is just a feeling at the moment. Need to make more tests.
it is likely a wrong work around, it broken the rx-amsdu
@ivanich @porentak could you try this fixup out https://pastebin.com/GdCCiGtU ?
@ryderlee1110 with this patch I cannot reproduce crash. On the other hand, I have a feeling throughput is lower than without (mt76: mt7615: enable hw rx-amsdu de-aggregation). But that is just a feeling at the moment. Need to make more tests.
it is likely a wrong work around, it broken the rx-amsdu
we cannot always guarantee first_amsdu arrives first since out of order reported by the fw, I think. maybe this is not right. will do more tests.
@ivanich @porentak could you try this fixup out https://pastebin.com/GdCCiGtU ?
@ryderlee1110 with this patch I cannot reproduce crash.
On the other hand, I have a feeling throughput is lower than without (mt76: mt7615: enable hw rx-amsdu de-aggregation). But that is just a feeling at the moment. Need to make more tests.
Have to combine with the further patches.
memmove(skb->data + 2, skb->data,
ieee80211_get_hdrlen_from_skb(skb));
it seems the memmove may crash the memory please try my BUG_ON patch
memmove(skb->data + 2, skb->data, ieee80211_get_hdrlen_from_skb(skb));
it seems the memmove may crash the memory please try my BUG_ON patch
Did you run into this?
memmove(skb->data + 2, skb->data, ieee80211_get_hdrlen_from_skb(skb));
it seems the memmove may crash the memory please try my BUG_ON patch
Did you run into this?
not really, for now I have no device at hand. but for my understanding, if that case happen, it crash
@porentak @ptpt52 can you please try this patch?
@porentak @ptpt52 can you please try this patch?
With this patch it does not crash, but throughput is 3-10Mb/s in client -> AP direction.
btw are you guys running both phy at the same time?
@ryderlee1110 yes. Issue is only if both phys are enabled.
diff --git a/mt7615/mac.c b/mt7615/mac.c index 59aa8f8..561d573 100644 --- a/mt7615/mac.c +++ b/mt7615/mac.c @@ -469,6 +469,7 @@ static int mt7615_mac_fill_rx(struct mt7615_dev *dev, struct sk_buff *skb) status->first_amsdu = amsdu_info == MT_RXD1_FIRST_AMSDU_FRAME; status->last_amsdu = amsdu_info == MT_RXD1_LAST_AMSDU_FRAME; if (!hdr_trans) { + BUG_ON(skb_tail_pointer(skb) - skb->data < ieee80211_get_hdrlen_from_skb(skb)); memmove(skb->data + 2, skb->data, ieee80211_get_hdrlen_from_skb(skb)); skb_pull(skb, 2);
@porentak could you try this patch?
It crashes.
Device GL.Inet mt-1300 openwrt master with commit https://git.openwrt.org/?p=openwrt/openwrt.git;a=commit;h=12b5f898f9222d35b80058fee5bd2ae935226070 Except not resolved issue with radio ordering as stated in the https://github.com/openwrt/mt76/issues/474 now it crashes when connected via 5Ghz
Can be reproduced when running speedtest android app on phone connected via 5Ghz network to mt-1300 router