openwrt / mt76

mac80211 driver for MediaTek MT76x0e, MT76x2e, MT7603, MT7615, MT7628 and MT7688
744 stars 343 forks source link

Huge ping value increase after running 5g wifi for some time #152

Closed updatede closed 6 years ago

updatede commented 6 years ago

device: xiaomi router 3g target: ramips/mt7621 system: OpenWrt SNAPSHOT r6150-dc7a1e8 signal strength: -60dbm I use just only one client to connect router: a ipad air2. When wlan just start: img_1880 After play game for some time, network get stucked, at the same place, close all programs on pad, ping value increase huge, won't decrease untill restart wlan on router, after restart ping value decrease to normal. img_1879

jedi7 commented 6 years ago

And want help. Because @nbd168 do great work. (It is still opensource)

nbd168 commented 6 years ago

I think I may have found a possible cause for this. Apparently the beacon timer drifts by one microsecond every time it fires. The minimum configurable interval is 64us, so it has to be corrected every 64 beacons. This would also explain why changing the beacon interval affects the time it takes for this issue to manifest. When the client is in powersave mode, it regularly wakes up before the beacon transmission time and waits for a short period of time before going back to sleep again. After some time, the drift becomes so big that the client goes back to sleep again before the AP sends its beacon. The client then only notices after beacon loss detection kicks in and forces the client out of powersave. I will let you know when I've implemented a fix for this and verified the timing. In the mean time, is there anybody here except for @updatede that can reliably reproduce the issue?

Mushoz commented 6 years ago

I was seeing the same issue 2 months ago when I tried the master branch. I returned back to 17.01.4 where this issue does not exist for me.

I haven't tried the master branch in the mean time, but I plan on flashing the 18.06 branch this Monday when I come back from my holiday to help testing. Will let you know if I am able to reliably reproduce this.

nbd168 commented 6 years ago

I've made a fix and verified the timings. It's in the master branch now, and I intend to push it to the 18.06 branch soon.

slthomason commented 6 years ago

@nbd168 - I am not trying to cross issues - but the description here in the email as well as in the commit perfectly describe behavior that we are getting really badly also on the mt7603 (2Ghz) chipset that we reported in this issue: https://github.com/openwrt/mt76/issues/167

Any chance there would be a way to implement similar checks and behavior for the mt7603 chips as well? Could they both be suffering from the same? We are only able to replicate the behavior when devices are coming out of power save mode ....

On Fri, May 18, 2018 at 9:16 AM, nbd168 notifications@github.com wrote:

I've made a fix and verified the timings. It's in the master branch now, and I intend to push it to the 18.06 branch soon.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/openwrt/mt76/issues/152#issuecomment-390257778, or mute the thread https://github.com/notifications/unsubscribe-auth/AB0TcvpjHM534DvjyJ0oyE2PQNQNwK9wks5tzvPJgaJpZM4SJL20 .

nbd168 commented 6 years ago

I ran some tests, MT7603 does not have the same issue

araujorm commented 6 years ago

Hello.

As I have mentioned in https://github.com/openwrt/mt76/issues/139 during tests for mt76x2u (USB variant) that is sharing some of the code, this exact issue happens everytime one scans for networks. TX and RX are simply blocked for about 15 seconds, which is not normal. I've just had the chance of confirming that it happens exactly the same with mt76x2 on latest master.

Just try it, launch some kind of speed test, iperf or ping and then do a iw dev wlan1 scan inside a shell on your openwrt router, or invoke a scan with the 5GHz radio in LuCI.

As I've also mentioned there, if you go to tx.c and comment both of the if (...) return -EBUSY lines the issue no longer happens, although performance is subpar during the scan.

My guess is that the original issue reporter has something in his configuration that triggers a network scan once in a while (in my case with the USB variant, it was happening very frequently since I was in a PC with Fedora/MATE and NetworkManager triggers a scan every other minute if the signal isn't what they would consider perfect).

Hope this helps @nbd168 and @LorenzoBianconi to find out what is going on.

nbd168 commented 6 years ago

I've fixed the issue with latency during scanning in current mt76 git. Please test.

araujorm commented 6 years ago

@nbd168 Looking good, I no longer have total RX/TX loss when scanning. Ping responses are now normal when doing so. Iperf3 performance isn't still what I'd call optimal on that situation, but that may involve something specific to my environment, so it would be nice if other people also tested and gave feedback :)

Thank you.

Mushoz commented 6 years ago

I promised to get back to you @nbd168 to report whether I could still reproduce this issue. I have been running the 18.06 branch since Tuesday (which does not contain the beacon timer drift fix) and I am unable to reproduce the issue. So in my case, the flaky 5ghz connection must have been caused by something else. So therefor, I am unable to test whether the beacon timer drift fix actually fixed anything regarding this reported issue. It would be best if @updatede could confirm/deny whether the fix is successful for him.