linux4wilc / driver

DEPRECATED: Updated Linux drivers for the ATWILC1000/ATWILC3000 products are located at https://github.com/linux4microchip/linux/tree/master/drivers/net/wireless/microchip/wilc1000. To simplify development, the legacy Linux4WILC was merged into the Linux4Microchip repo where driver development continues (Please refer latest ATWILC1000/ATWILC 3000 Wi-Fi Link Controller Linux User Guide) Driver code for Microchip ATWILC Wireless Devices (ATWILC1000 & ATWILC3000)
https://www.microchip.com/wwwproducts/en/ATWILC1000
32 stars 19 forks source link

suspicious rx behavior on low link quality #108

Open MTaulin opened 3 years ago

MTaulin commented 3 years ago

Hi, currently im investigating a problem with the wilc1000 rx on low link quality. I am using the wilc1000 on a custom imx6 device via sdio and running kernel 5.8. Actually the device is running with the driver version 15.3 and firmware version 15.4.1 (but this also happened at 15.3 firmware). On good wifi link quality everything works fine i have a bitrate around 20 mbps which is actually okay for the usage. But Im getting problems on rx when the signal level goes down to e.g -76dbm. At this level of course the data rate drops because of some package loss and so on. But in my case the rate drops to an almost unusable state. I test with iperf3 and running the device as server and my computer, attached via ethernet to the router, as client. below there are two snippets of the iperf3 test. In the first test i positioned the router to a distance everything worked well. On the second i moved the router like 0.5 meters away from the device and measured the second output. Auswahl_006


Auswahl_012

Sometimes there are higher rates on low quality but then I have the problem that there are big gaps in which are no bytes sent. The main problem here is, that the rate goes either very low or there are big gaps between the sent packages. This makes it almost unusable see next snippet Auswahl_008

On the other side the device itself shows a straight data transfer but with a lower transfer amount for each second so my guess is, that the data will be buffered and the processing is too slow. Sometimes this leads to a disconnecting ssh connection while the wifi connection is still active. On the other side on other tests i was waiting and after like 5 Minutes i still got previous outputs from the device on ssh but it was still unusable.

Auswahl_013

I have done several tests and want to share some of my thoughts. I guess it's a problem with the lost packages. On my tests i was connected via ssh to start the iperf3 server. Now when i start the test i can't enter anything via ssh. It freezes completely because of the amount of data. But when i finish the transmission all my previous inputs via ssh arriving so i guess it's a kind of buffering problem. On a low quality link the device have to work more because of the NACK's and has to handle the retries from the client. And i guess this slow down the processing of all other incoming packages. When i decrese the window size to for example to 50 kbps the rate is more constant and i do not see any of those problems ( i guess it is because the amount of packages is significantly lower and so there are less errors like missing packages and so on).

On normal link quality everthing works fine and i don't have this behavior.

HDC67 commented 3 years ago

Could it be TCP ACKs (or whatever iperf transmits as a server) not getting through optimally due to this issue? Should still work slowly if transmissions eventually get through though. https://github.com/linux4wilc/firmware/issues/7

MTaulin commented 3 years ago

Well i guess when packages are queued up the packages will be ACK'ed but much later than normal. I think the processing of the false packages slows down the whole process. I also thought it just should go down but still work slowly but after a time it just slow down to one transfer in several seconds. What i can say is that when i start iperf it works correctly for the first few packages but depending on the Link quality it will we bad after few seconds. At start the intervals of non sent packages is quiet smaller as after like two minutes. So first there are a like 5-7 seconds later this raises to like 20+ seconds. And like i said when the link quality is good i can't see anything like this and the data rate is really good. So i guess it's something with the error handling which completely drop the datarate and stucking the whole system.

Mateusz-Gwara commented 3 years ago

@MTaulin Your observations describe pretty much our experience with hundreds of WILC3000 running as AP. When running traffic with lots of small packages like SIP there are lots of hiccups and even worse, the WILC driver causes a rapidly rising system load until the kernel crashes. If the traffic ceases quickly enough the driver recovers from that...

tsifb commented 3 years ago

If you have not done so already, I highly recommend setting up Kali linux in a VM and using a USB Wifi adapter ( I use Alfa networks AWUS036CH) and use Wireshark to sniff the 802.11 packets on the air.

This method has allowed me to find many many issues with WILC.

As commented by @ShonkyCH , https://github.com/linux4wilc/firmware/issues/7 does seem like it may be related to your issue. This issue highlights issues with how WILC firmware chooses the initial TX data rate for each packet, which is often too high and can be seen as several retries for tx packets before success. This seems to be especially noticeable when the other device is much higher power, and it is rx'ing the WILC at mid - low RSSI. The result is a much lower throughput than compared to the reduced link capability (rate). In your case, it may be delaying your TCP ACKs enough to trigger TCP retransmits, at which point the whole system has a positive feedback loop an devolves into a flaming death spiral. Microchip has said they will look at this issue only if there is a significant business case. Apparently a decent working product is not business case enough.

@Mateusz-Gwara this also might cause excessive jitter with your SIP traffic.

Either way, seeing exactly what is happening on the air may be insightful. Good luck.

Mateusz-Gwara commented 3 years ago

Thanks for your insights @tsifb & @ShonkyCH It's sad to see this product die (and the unsolved salesforce tickets along with it).

We had to develop a new product revision with a panasonic wifi chip which runs gigabytes of streaming traffic for months without any problem.

Still we have a problem with all legacy systems in the field...

MTaulin commented 3 years ago

Thanks for your answer guys, im following the status of the wilc driver/firmware for +two years now and first i saw that Microchip was really into the wilc problems and tried to support and fix it. But since the last year they really limit their support for those devices and like u said @tsifb i don't think they wont put further resources on supporting.... And i already tested on a previous test with wireshark and can confirm that there also were delayed ack's and it seems that lower quality just raise those and killing the transmission. Too bad microchip wont do anything but still selling those chips with those significant problems.

tsifb commented 3 years ago

@Mateusz-Gwara what is the part number of your new panasonic wifi chip?

Mateusz-Gwara commented 3 years ago

@tsifb It's the ENWF9202A1EF I'm running it with the kernel driver (it's included since 5.x kernel) and the kernel firmware. There is a yocto based driver and software package from panasonic which they recommend, but as we run a debian based system I wanted to use as much as possible coming out of the box with minimum modifications. I can send you the microSDHC/SDIO dev board if you want, as I don't need it anymore (we are currently rolling out our own production hardware).