greearb / ath10k-ct

Stand-alone ath10k driver based on Candela Technologies Linux kernel.
111 stars 40 forks source link

160Mhz does not work under latest firmware with Intel AC 9560 #94

Closed swg0101 closed 4 years ago

swg0101 commented 5 years ago

Please provide this info. See this link for more info on how to gather debug info: http://www.candelatech.com/ath10k-bugs.php

Description of the problem (how to configure, how to reproduce, how often it happens). After updating to the latest ath10k-firmware-qca9984-ct firmware (bd926fd), my Intel AC 9560 will no longer connect to my 160Mhz network properly. While the connection reports being connected at 1.7Gbps, no data was received and hence the connection was unusable. Downgrading to 7c640c2 appears to have fixed this issue.

Software (OS, Firmware version, kernel, driver, etc) Windows 10

Hardware (NIC chipset, platform, etc) Nighthawk R7800 - QCA9984

Logs (dmesg, maybe supplicant and/or hostap)

chunkeey commented 5 years ago

https://github.com/greearb/ath10k-ct/issues/85#issuecomment-520218785

Says that the driver does not have the 80+80 or 160MHz mode Bits enabled for the "-ct" firmware *to access the DFS channels. Is this related?

swg0101 commented 5 years ago

I don't think this is related since it was working in an earlier firmware version... That being said, it looks like I am only getting about 200Mbps on 5Ghz AC regardless of whether I chose 40, 80, or 160Mhz channel width.

greearb commented 5 years ago

Are those commit-ids from the version of the ath10k-ct firmware? I think they are not since they are not in my tree. Please look at the dmesg output on bootup and let me know the firmware version in the good and failing cases. It should be something like "10.4b-ct-9984-xtH-012-edf123888". You can also get the info from debugfs:

[root@lf0313-63e7 ~]# cat /debug/ieee80211/wiphy1/ath10k/firmware_info directory: ath10k/QCA9984/hw1.0 firmware: firmware-5-htt-mgt-b.bin fwcfg: fwcfg-pci-0000:04:00.0.txt bus: 0000:04:00.0 features: mfp,peer-flow-ctrl,txstatus-noack,wmi-10.x-CT,rxswcrypt-CT,ratemask-CT,regdump-CT,txrate-CT,flush-all-CT,pingpong-CT,ch-regs-CT,nop-CT,htt-mgt-CT,set-special-CT,no-bmiss-CT,tx-rc-CT,cust-stats-CT,CT-STA,txrate2-CT,beacon-cb-CT,wmi-block-ack-CT,wmi-bcn-rc-CT version: 10.4b-ct-9984-xtH-012-edf123888 hw_rev: 9984 board: board-2.bin

swg0101 commented 5 years ago

Not working: 10.4b-ct-9984-fH-012-54863dff2 Not working: 10.4b-ct-9984-fH-012-c7b0f3f98 Working: 10.4b-ct-9984-fH-012-e80202737

swg0101 commented 4 years ago

Let me know once you get a set of images bisected from those builds I can test... thanks :)

greearb commented 4 years ago

Please try this one (for 9984). Rename .png to .bin. If this works, then I will start making changes on top of it to bisect....I have to do this manually due to a rebased set of patches causing the regression. firmware-5-full-htt-mgt-community bin

swg0101 commented 4 years ago

This one works

greearb commented 4 years ago

Please try this one: firmware-5-full-htt-mgt-community bin

swg0101 commented 4 years ago

The interface for 5Ghz fails to be brought up:

Wed Sep 25 09:56:21 2019 daemon.notice hostapd: wlan4: INTERFACE-ENABLED Wed Sep 25 09:56:21 2019 daemon.notice hostapd: wlan4: INTERFACE-DISABLED Wed Sep 25 09:56:21 2019 daemon.info : 11[KNL] interface wlan4 deactivated Wed Sep 25 09:56:21 2019 daemon.err hostapd: nl80211: Could not configure driver mode Wed Sep 25 09:56:21 2019 daemon.notice hostapd: nl80211: deinit ifname=wlan4 disabled_11b_rates=0 Wed Sep 25 09:56:21 2019 daemon.err hostapd: nl80211 driver initialization failed. Wed Sep 25 09:56:21 2019 daemon.notice hostapd: wlan4: interface state UNINITIALIZED->DISABLED Wed Sep 25 09:56:21 2019 daemon.notice hostapd: wlan4: AP-DISABLED Wed Sep 25 09:56:21 2019 daemon.notice hostapd: wlan4: CTRL-EVENT-TERMINATING Wed Sep 25 09:56:21 2019 daemon.err hostapd: hostapd_free_hapd_data: Interface wlan4 wasn't started Wed Sep 25 09:56:21 2019 daemon.notice netifd: radio0 (9820): cat: can't open '/var/run/wifi-phy4.pid': No such file or directory Wed Sep 25 09:56:21 2019 daemon.notice netifd: radio0 (9820): WARNING (wireless_add_process): executable path /usr/sbin/wpad does not match process path (/proc/exe) Wed Sep 25 09:56:21 2019 daemon.notice netifd: radio0 (9820): Command failed: Invalid argument Wed Sep 25 09:56:21 2019 daemon.notice netifd: radio0 (9820): Device setup failed: HOSTAPD_START_FAILED Wed Sep 25 09:56:21 2019 daemon.info : 07[KNL] interface wlan4 deleted

swg0101 commented 4 years ago

Actually it is starting to do that on the old firmware as well - let me reboot this and try again.

swg0101 commented 4 years ago

Just got it to load properly - but this one fails with the same no packet received symptom.

greearb commented 4 years ago

Can you please verify the version? Does it have some part of this in the version? d0dc1f0fd159fbe12dc787ee9420d9510a2183ca

The change I made in that commit vs the previous was only to set the initial MCS to 0 so that rate-ctrl starts at lowest rates. I'm finding it hard to believe that is enough to totally break communication. Do you have any way to sniff the air (maybe with another ath10k-ct system in monitor mode configured for the same 160Mhz channel?)

swg0101 commented 4 years ago

Ok - I will grab another router from storage and verify. I will let you know when I get an aircap.

swg0101 commented 4 years ago

Ok, I was able to perform the air caps today and here are the results:

Good firmware: firmware-version: 10.4b-ct-9984-fH-012-54863dff2

Bad firmware: firmware-version: 10.4b-ct-9984-fH-012-d0dc1f0fd

With the good firmware, I am seeing that data coming back from the router is transmitted under VHT MCS 3 @ 160Mhz and connectivity was fine without any issues. image

On the bad firmware, the DHCP responses are transmitted under non-VHT, which was received by the machine. The ARP replies, however, gets transmitted under MCS 7 @ 80 Mhz and the machine never sees these (and you can see that the gateway keeps retransmitting them since the client never acknowledges these). The whole capture was pretty much the client asking where the gateway is with the ARP replies never seen by the client itself.

image

I hope that is useful... :)

greearb commented 4 years ago

That is quite interesting. I will see if I can reproduce.

Thanks, Ben

On 10/05/2019 01:37 AM, swg0101 wrote:

Ok, I was able to perform the air caps today and here are the results:

Good firmware: firmware-version: 10.4b-ct-9984-fH-012-54863dff2

Bad firmware: firmware-version: 10.4b-ct-9984-fH-012-d0dc1f0fd

With the good firmware, I am seeing that data coming back from the router is transmitted under VHT MCS 3 @ 160Mhz and connectivity was fine without any issues. image https://user-images.githubusercontent.com/14957466/66252407-2fdcc000-e710-11e9-9602-a483f3080ad3.png

On the bad firmware, the DHCP responses are transmitted under non-VHT, which was received by the machine. The ARP replies, however, gets transmitted under MCS 7 @ 80 Mhz and the machine never sees these (and you can see that the gateway keeps retransmitting them since the client never acknowledges these). The whole capture was pretty much the client asking where the gateway is with the ARP replies never seen by the client itself.

image https://user-images.githubusercontent.com/14957466/66252448-99f56500-e710-11e9-8db4-7600abf073cc.png

I hope that is useful... :)

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/94?email_source=notifications&email_token=AACHNKVXCA6QY7UH54744QLQNBG3DA5CNFSM4IWVOCZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEANNPVY#issuecomment-538630103, or mute the thread https://github.com/notifications/unsubscribe-auth/AACHNKUMX3HIE3QJXFKAJXDQNBG3DANCNFSM4IWVOCZA.

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

swg0101 commented 4 years ago

If I sort by bandwidth on the bad capture (filtered by the source BSSID), it looks like that most of everything was transmitted under 80Mhz, and was never acknowledged by the client (you can see the bands of blue on the capture):

image

The only thing that the client did receive in this case was the ones transmitted under non-VHT @ 6 Mbps. Those are the ARP requests coming from the gateway asking where the client is.

On the top of the capture, sorted by bandwidth, I am not seeing that the router ever transmitted any frames under the 160Mhz bandwidth: image

There also does not appear to be any ramping up I could see. The router just uses MCS 7 @ 80 Mhz to transmit these frames initially and continue to do so throughout the whole capture without any changes.

Also to add, the client, in this case, is sitting fairly close to the router, approximately 6 ft or so. And of course, this only affected the 160Mhz clients. Any clients using 80Mhz was still able to connect normally.

swg0101 commented 4 years ago

On the good capture, also filtered by source BSSID, everything was transmitted either under non-VHT @ 6 Mbps or various MCS levels at 160Mhz:

image

image

swg0101 commented 4 years ago

Another thing I found interesting, although not entirely unexpected is that the Block-Ack Bitmap for the bad capture is almost entirely zeros (sorted descending order) - (i.e. missing every frame), whereas the good capture has mostly f's in its bitmap (i.e. all frames received):

image

greearb commented 4 years ago

And everything is more normal if you put the AP into 80Mhz mode?

Probably there is some bug in how I'm setting up the rate-ctrl in 160Mhz mode.

Thanks, Ben

On 10/06/2019 11:59 AM, swg0101 wrote:

Another thing I found interesting, although not entirely unexpected is that the Block-Ack Bitmap for the bad capture is almost entirely zeros (sorted descending order) - (i.e. missing every frame), whereas the good capture has mostly f's in its bitmap (i.e. all frames received):

image https://user-images.githubusercontent.com/14957466/66274175-ab359300-e830-11e9-9d58-f64909590c38.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/greearb/ath10k-ct/issues/94?email_source=notifications&email_token=AACHNKVAZROGQKX272F2Q7LQNIYRBA5CNFSM4IWVOCZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAORMBI#issuecomment-538777093, or mute the thread https://github.com/notifications/unsubscribe-auth/AACHNKS74TTOAIATFAJRMXLQNIYRBANCNFSM4IWVOCZA.

-- Ben Greear greearb@candelatech.com Candela Technologies Inc http://www.candelatech.com

swg0101 commented 4 years ago

Yep, everything works in 80Mhz.

greearb commented 4 years ago

I am able to reproduce this using 9984 radios as station, so hopefully I can make some progress.

swg0101 commented 4 years ago

Great to hear! 😃

greearb commented 4 years ago

Please test this one, and if possible verify in both 80 and 160 mode using a sniffer to make sure it uses reasonable rates. firmware-5-full-htt-mgt-community.bin.gz

swg0101 commented 4 years ago

Thanks - I will go ahead and put this on my test router and see what I find.

swg0101 commented 4 years ago

Just tested this now: firmware-version: 10.4b-ct-9984-fH-012-a8d7f58ff

80Mhz: Seems to ramp up fine to 866.7Mbps as expected in about 2 seconds. 160Mhz: Everything is still transmitted under MCS 7 @ 80Mhz (877.5Mbps), and the machine still having trouble receiving the ARP responses from the router.

I noticed that in Github the changelog says:

*  June 24, 2019: Init rate-ctrl to start at lowest rate instead of in the middle.  Hoping
                    this helps DHCP when station connects from a long distance.

I am not sure if this really helps DHCP per se since for both the good and bad capture the DHCP responses were sent under the control channel non-VHT @ 6Mbps.

greearb commented 4 years ago

I guess you are right about DHCP, but other early protocols, such as IPv6 multicast, do appear to use the initial MCS, and it still seems right (to me) to start at low MCS. Here is a new binary. It may assert and crash, if so, please send me the dmesg output as it should have debugging that I need to better understand the issue. firmware-5-full-htt-mgt-community.bin.gz

swg0101 commented 4 years ago

firmware-version: 10.4b-ct-9984-fH-012-e747272a4

Same symptoms and MCS on 160Mhz - no crash or asserts.

greearb commented 4 years ago

I guess your NIC is 1x1 at 160Mhz? I'll see if I can reproduce by forcing my station to act as 1x1 160Mhz (instead of it's normal 2x2).

swg0101 commented 4 years ago

It looks like from the frames sent earlier that it is 2x2 - but then the problematic frames seem to be the ones sent by the router's side.

swg0101 commented 4 years ago

Any updates? :)

greearb commented 4 years ago

I have not had time to try to reproduce this yet. Hope to get to it soon.

simonsmh commented 4 years ago

I noticed the exactly same problem on r7800 160mhz default ct driver on 1907rc1, my card is the Intel ax200 160mhz. My other 80mhz device doesn't have any problem. So any progress?

greearb commented 4 years ago

I think I have this fixed now. Please try the attached firmware. firmware-5-full-htt-mgt-community.bin.gz

jerrytouille commented 4 years ago

I think I have this fixed now. Please try the attached firmware. firmware-5-full-htt-mgt-community.bin.gz

This worked!! 10.4b-ct-9984-fH-012-c77420752, using it in openwrt 19.07 branch. r7800

swg0101 commented 4 years ago

I too have an R7800, so I assume that will work there too. Won't really have a chance to look until tomorrow since I am out of town. What seemed to be the issue?

greearb commented 4 years ago

The issue was in the end, bad math when trying to find the proper rate index for 160Mhz bw. This was firmware bug, and something I caused when I was previously trying to fix other problems in this area. The bug had been around for a long time. I'm going to close the bug, can re-open or open a new one if more issues are found.

swg0101 commented 4 years ago

Awesome thanks Ben. Hopefully you can push this to openwrt master soon so I can pull directly there. Thanks!