qca / open-ath9k-htc-firmware

The firmware for QCA AR7010/AR9271 802.11n USB NICs
Other
428 stars 182 forks source link

USB layer deinitialized on Chromebook c201 #124

Open albsod opened 7 years ago

albsod commented 7 years ago

On the Chromebook c201 (ARMv7), running linux-libre 4.9.5 (a merely deblobbed mainline kernel) or Google's linux 3.14 kernel with firmware version 1.4, ath9k-htc wifi dongles are very unreliable. I've tried three different types of USB dongles, to turn off the leds as well as to connect the wifi dongles to a powered USB hub without any noticable differences. The device will simply disconnect after a few minutes and the only way to reconnect to the wireless network is to physically remove and reinisert the dongle.

dmesg output with just the wifi dongle: dmesg-atheros.txt

dmesg output with the wifi dongle connected to a powered USB hub: dmesg-atheros-usbhub.txt

I'm currently connected via an Apple USB-ethernet dongle, and this connection is reliable.

lsusb output with wifi dongle disconnected and ethernet dongle connected:

Bus 002 Device 002: ID 05ac:1402 Apple, Inc. Ethernet Adapter [A1277]
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 002: ID 04f2:b53a Chicony Electronics Co., Ltd 
Bus 003 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
olerem commented 7 years ago

Which Wifi usb adapter do you use?

albsod commented 7 years ago

In the above dmesg I used a TP-Link TL-WN821N (https://tehnoetic.com/tet-n300)

However, I've also tried a Unex DNUA-93F (https://tehnoetic.com/tehnoetic-wireless-adapter-gnu-linux-libre-tet-n150) and a TPE-N150USBL (https://www.thinkpenguin.com/gnu-linux/penguin-wireless-n-usb-adapter-w-external-antenna-gnu-linux-tpe-n150usbl)

olerem commented 7 years ago

Hm... it looks like usb related issue. Nothing to do with actual ath9k-htc driver or firmware. For example like this: http://www.netnode.de/howto/USB_modem_power_issues.html The usb dwc seems to be know for different wired problems. On other hand ath9k-htc controllers are picky about USB too. They will fail to work with bad or long usb cables. There is also hard to compare USB Ethernet adapter with USB WiFi adapter, since many WiFi adapter consumes more power.

SolidHal commented 5 years ago

@olerem @albsod I have found a decent workaround for the dwc2 issues on the c201 with ath9k. This patch applies the workaround https://github.com/SolidHal/PrawnOS/blob/master/resources/BuildResources/patches-tested/kernel/reverse-do-not-use-bulk-on-EP3-and-EP4.patch

I'm not totally sure why applying this patch to revert the ep3, ep4 commit works as I'm not super familiar with usb or how the ath drivers/firmware work so doing this may cause issues with non-dwc2 systems. I plan to investigate this further. If you have any ideas @olerem I'd love to hear them.

@albsod feel free to checkout https://github.com/SolidHal/PrawnOS if you want a system to build a fully libre kernel with a libre os.

olerem commented 5 years ago

@SolidHal , the EP3 and EP4 are Int, not bulk. On firmware side, they are read out from 64byte FIFO, not from DMA. The Bulk transfer may and will be more then 64bytes in some cases. It means, the host will write more data then i can be handled by the adapter, so you will get silent message corruptions. The messages transferred on this endpoints do different things, including register read/write operations. You are defiantly not willing to debug some wired problems caused by trash written to configuration registers. In patch you made a workaround which may work for you right now, bat it is just braking other things.

SolidHal commented 5 years ago

@olerem Thanks for the detailed response. So, in the original commit all of these changes make the "type" of EP3 and EP4 from int, correct ?

@@ -115,10 +115,10 @@ static int hif_usb_send_regout(struct hif_device_usb *hif_dev,
    cmd->skb = skb;
    cmd->hif_dev = hif_dev;

-   usb_fill_bulk_urb(urb, hif_dev->udev,
-            usb_sndbulkpipe(hif_dev->udev, USB_REG_OUT_PIPE),
+   usb_fill_int_urb(urb, hif_dev->udev,
+            usb_sndintpipe(hif_dev->udev, USB_REG_OUT_PIPE),
             skb->data, skb->len,
-            hif_usb_regout_cb, cmd);
+            hif_usb_regout_cb, cmd, 1);

    usb_anchor_urb(urb, &hif_dev->regout_submitted);
    ret = usb_submit_urb(urb, GFP_KERNEL);
@@ -723,11 +723,11 @@ static void ath9k_hif_usb_reg_in_cb(struct urb *urb)
            return;
        }

-       usb_fill_bulk_urb(urb, hif_dev->udev,
-                usb_rcvbulkpipe(hif_dev->udev,
+       usb_fill_int_urb(urb, hif_dev->udev,
+                usb_rcvintpipe(hif_dev->udev,
                         USB_REG_IN_PIPE),
                 nskb->data, MAX_REG_IN_BUF_SIZE,
-                ath9k_hif_usb_reg_in_cb, nskb);
+                ath9k_hif_usb_reg_in_cb, nskb, 1);
    }

 resubmit:
@@ -909,11 +909,11 @@ static int ath9k_hif_usb_alloc_reg_in_urbs(struct hif_device_usb *hif_dev)
            goto err_skb;
        }

-       usb_fill_bulk_urb(urb, hif_dev->udev,
-                 usb_rcvbulkpipe(hif_dev->udev,
+       usb_fill_int_urb(urb, hif_dev->udev,
+                 usb_rcvintpipe(hif_dev->udev,
                          USB_REG_IN_PIPE),
                  skb->data, MAX_REG_IN_BUF_SIZE,
-                 ath9k_hif_usb_reg_in_cb, skb);
+                 ath9k_hif_usb_reg_in_cb, skb, 1);

        /* Anchor URB */
        usb_anchor_urb(urb, &hif_dev->reg_in_submitted);
@@ -1268,7 +1252,7 @@ static void ath9k_hif_usb_reboot(struct usb_device *udev)
    if (!buf)
        return;

-   ret = usb_bulk_msg(udev, usb_sndbulkpipe(udev, USB_REG_OUT_PIPE),
+   ret = usb_interrupt_msg(udev, usb_sndintpipe(udev, USB_REG_OUT_PIPE),
               buf, 4, NULL, HZ);
    if (ret)
        dev_err(&udev->dev, "ath9k_htc: USB reboot failed\n");

I think I understand that part of the commit.

What was the following code your commit removed achieving if the EP3 and EP 4 were already treated as bulk? I read the original commit here https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4a0e8ecca4ee but if EP3 and EP 4 were already treated as bulk before you made the above changes, I don't quite understand what commit 4a0e8ecca4ee did

@@ -1031,9 +1031,7 @@ static int ath9k_hif_usb_download_fw(struct hif_device_usb *hif_dev)

 static int ath9k_hif_usb_dev_init(struct hif_device_usb *hif_dev)
 {
-   struct usb_host_interface *alt = &hif_dev->interface->altsetting[0];
-   struct usb_endpoint_descriptor *endp;
-   int ret, idx;
+   int ret;

    ret = ath9k_hif_usb_download_fw(hif_dev);
    if (ret) {
@@ -1043,20 +1041,6 @@ static int ath9k_hif_usb_dev_init(struct hif_device_usb *hif_dev)
        return ret;
    }

-   /* On downloading the firmware to the target, the USB descriptor of EP4
-    * is 'patched' to change the type of the endpoint to Bulk. This will
-    * bring down CPU usage during the scan period.
-    */
-   for (idx = 0; idx < alt->desc.bNumEndpoints; idx++) {
-       endp = &alt->endpoint[idx].desc;
-       if ((endp->bmAttributes & USB_ENDPOINT_XFERTYPE_MASK)
-               == USB_ENDPOINT_XFER_INT) {
-           endp->bmAttributes &= ~USB_ENDPOINT_XFERTYPE_MASK;
-           endp->bmAttributes |= USB_ENDPOINT_XFER_BULK;
-           endp->bInterval = 0;
-       }
-   }
-

It seems to me that dwc2 is probably handing some aspect of usb_interrupt_msg or usb_sndintpipe, would that be a decent assumption?

olerem commented 5 years ago

@SolidHal

So, atheros devs noticed some performance issues and tried to add this workaround.

  1. try. patched firmware to provide different descriptor - it is not working, because adapter should trigger reinit of this interface. Suddenly some host controller will power cycle the adapter, so patched firmware will be lost
  2. try. patch endp->bmAttributes on the host.. Dint worked well. May be on some point, but this way is just a hack and was not expected to work long
  3. try. don't patch, just use usb_bulk_msg... it seems to work on some systems, but terribly brake on other. 4.try. remove workaround and make sure it is not violating specifications, brakes dwc2.
SolidHal commented 5 years ago

@olerem Thank you, that clears it up for me.

albsod commented 5 years ago

@SolidHal Thanks a lot! I just built and installed PrawnOS (Debian + Linux 4.17) on my machine and now the connectivity problem seems to have disappeared completely. I'm using the htc_7010 firmware by the way.

Leebre commented 5 years ago

So, what is the solution to this issue? Applying the patch that is linked at PrawnOS? I'm a little confused by @olerem 's post above. It seems to suggest there is no good solution that doesn't break something else.

I have been having similar issues with a N150 wifi dongle that I bought from ThinkPenguin (with a C201).

SolidHal commented 5 years ago

The patch I created for PrawnOS allows it to function, however does lead to some instability as @olerem suggested. The only "perfect" solution right now is to replace the webcam with the wifi dongle since the webcam uses USB but not through the DWC2 bridge. I have a link to some rough instructions on the bottom of the PrawnOS ReadMe (I apologize for not linking, I'm on mobile).

I've been digging through the dwc2 driver for some time to see what causes this issue, and have made some progress towards a fix but it isn't perfect yet. I'll post any patches here when I find something.

On Apr 6, 2019, at 7:55 AM, Leebre notifications@github.com wrote:

So, what is the solution to this issue? Applying the patch that is linked at PrawnOS? I'm a little confused by @olerem 's post above. It seems to suggest there is no good solution that doesn't break something else.

I have been having similar issues with a N150 wifi dongle that I bought from ThinkPenguin (with a C201).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

Leebre commented 5 years ago

@SolidHal I tried your patch, but unfortunately it doesn't seem to work for me. In fact, if anything, it seems to make things worse: my whole system freezes a few seconds after the dongle tries to connect and then doesn't unfreeze again when I unplug the dongle (which it does with the packaged ath9k modules). So, I have to hard reboot.

This is the output text I get when applying the patch:

patching file drivers/net/wireless/ath/ath9k/hif_usb.c
Hunk #1 succeeded at 118 (offset 3 lines).
Hunk #2 succeeded at 727 (offset 4 lines).
Hunk #3 succeeded at 913 (offset 4 lines).
Hunk #4 succeeded at 1036 (offset 5 lines).
Hunk #5 succeeded at 1048 (offset 5 lines).
patch unexpectedly ends in middle of line 
Hunk #6 succeeded at 1336 with fuzz 1 (offset 68 lines).

Did you get that same warning after Hunk 5? (If not, perhaps I somehow copied the patch file incorrectly)

Leebre commented 5 years ago

Btw, out of interest, what is the dwc2? I must admit I'm not very familiar with how these drivers work.

SolidHal commented 5 years ago

@Leebre What version of the kernel are you trying to patch? I have versions for 4.17 and 4.19 I would ensure you have the full patch by grabbing it from the "raw" github link with wget to avoid copy and paste errors.

Dwc2 is the designware usb 2.0 controller that the c201 uses. It has... problems to say the least. Some of the RPI's use it as well, but since the RPI foundation maintains their own kernel, they also maintain their own driver for dwc2 which doesn't seem to have this issue.

Leebre commented 5 years ago

@SolidHal I was trying to patch the 5.0.6 version kernel (so, quite a new one). I will try it again over the weekend with one of those older versions and I will check again that I downloaded the patch file correctly. One other thing is that I am using a pre-built kernel from Parabola and their binary package doesn't seem to include the 'Module.symvers' file, so I'm not 100% sure if it is a problem to not have that, if I am compiling just the ath9k modules. However, I also tried re-building the non-patched modules, and they gave the same behavior I had before. So, I am fairly confident that my build process is ok.

I assume the dwc2 driver from RPI isn't libre?

Leebre commented 5 years ago

I downgraded my kernel to 4.20.0 and, even without the patch, the Atheros dongle seems to work much better. It still seems to hang the system if it is plugged in when I boot up, or if I close the lid and then re-open. However, after unplugging/re-plugging, the system frees up and the dongle seems quite stable.

This is probably good enough for me, for the time being.

Leebre commented 5 years ago

Actually no, it's not as stable as I thought at first. It seems I can get ~20-30 minutes of contunuous connectivity before it freezes (which is still better than with the 5.0.6 kernel, which freezes within about 10 seconds every time).

I downloaded @SolidHal 's patch with wget. I used diff to compare to the patch file I had before, and there seemed to be a missing newline somewhere in the previous one. However, when I applied the patch and re-built the ath9k modules, I got exactly the same catastrophic lock-up with the 4.20.0 kernel that I did with 5.0.6.

Leebre commented 5 years ago

So, I found the Module.symvers file for my kernel and reccompiled the ath9k modules with it. Now, the dongle seems to work very well with the patched modules with the 5.0.6 version kernel. Thank you @SolidHal ! :-)