greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

Burst/lag issues between 2 Macs on the same AP #132

Open timkgh opened 4 years ago

timkgh commented 4 years ago

OpenWRT (hnyman build r13134-f57230c4e6) with firmware-5-ct-*-community-12.bin-lede.018 from https://www.candelatech.com/downloads/ath10k-9984-10-4b/?C=M;O=D

2 x Macbook Pro 2018 connected to the same AP on the 5GHz radio

If I ssh between them, typing in the shell feels laggy/bursty. If I ping between them I can see pings jump to 200ms-1.5s plus some packets get lost.

The flavor of the .018 firmware does not seem to make a difference, I tried all 3 builds.

If I switch the AP to the "old" firmware (e.g. firmware-5.bin_10.4-3.10-00047) the above goes away.

greearb commented 4 years ago

The other tricky bugs were found by users spending time to bisect the issue, and that is probably the way forward with something like this as well since I don't have the hardware to reproduce this setup. If you would like to bisect, I'll build a series for you.

timkgh commented 4 years ago

I know you said to bisect, unfortunately in the past I had no luck finding anything with bisect.

FWIW, I tried firmware-5-ct-htt-mgt-community-12.bin-lede.019 and it is worse than .018

Here's what I'm seeing in terms of pings between the 2 Macs:

64 bytes from 192.168.x.x: icmp_seq=0 ttl=63 time=4.786 ms
Request timeout for icmp_seq 1
64 bytes from 192.168.x.x: icmp_seq=1 ttl=63 time=1327.736 ms
64 bytes from 192.168.x.x: icmp_seq=2 ttl=63 time=327.432 ms
64 bytes from 192.168.x.x: icmp_seq=3 ttl=63 time=14.336 ms
64 bytes from 192.168.x.x: icmp_seq=4 ttl=63 time=6.274 ms
64 bytes from 192.168.x.x: icmp_seq=5 ttl=63 time=780.346 ms
64 bytes from 192.168.x.x: icmp_seq=6 ttl=63 time=594.021 ms
64 bytes from 192.168.x.x: icmp_seq=7 ttl=63 time=15.676 ms
64 bytes from 192.168.x.x: icmp_seq=8 ttl=63 time=1583.478 ms
64 bytes from 192.168.x.x: icmp_seq=9 ttl=63 time=583.196 ms
64 bytes from 192.168.x.x: icmp_seq=10 ttl=63 time=6.155 ms
64 bytes from 192.168.x.x: icmp_seq=11 ttl=63 time=501.331 ms
Request timeout for icmp_seq 13
64 bytes from 192.168.x.x: icmp_seq=12 ttl=63 time=2772.456 ms
64 bytes from 192.168.x.x: icmp_seq=13 ttl=63 time=1772.326 ms
64 bytes from 192.168.x.x: icmp_seq=14 ttl=63 time=771.900 ms
64 bytes from 192.168.x.x: icmp_seq=15 ttl=63 time=591.271 ms
64 bytes from 192.168.x.x: icmp_seq=16 ttl=63 time=8.635 ms

Not sure if the same as https://github.com/greearb/ath10k-ct/issues/139

graysky2 commented 4 years ago

I think @greearb needs you to do the bisect. It's not that difficult but can be time consuming particularly if the bug is not triggered instantly. Fortunately, for me in #139 , it is really straightforward to see the bug. Before you dig into bisecting, have you tried the ath10k driver and firmware? For me, there is no bug running it.

On your OW device:

opkg update
opkg remove kmod-ath10k-ct ath10k-ct-firmware
opkg install kmod-ath10k ath10k-firmware
reboot
timkgh commented 4 years ago

I'm all too familiar with the "old" driver/firmware, that's what I normally run (hnyman's builds) as that works best for me with my variety of clients. It also works to simply replace just the firmware file with the "old" one and still use the "ct" driver, for me the problems are clearly in the firmware.

Once in a while I try "ct" to check if the issues still persist. Over time some got fixed (e.g. issues with my Nest E thermostat) while new ones popped up.