wulf7 / iichid

Generic HID layer for FreeBSD. Including I2C and USB backends.
BSD 2-Clause "Simplified" License
45 stars 6 forks source link

panic: device_unbusy: called for non-busy device pci0 #22

Closed valpackett closed 4 years ago

valpackett commented 4 years ago

iichid_intr → iicbus_release_bus → device_unbusy (recursively 4 times)

Something went wrong with this after updating -CURRENT..

I don't see anything wrong in the iicbus_request_bus / iicbus_release_bus code, nor the code using them here.

For now I've replaced the panic() with a print (the only downside is I can't unload iichid, it says "Device busy" now). I can see that this happens both on pci0 and iichid1, the busy field is non-zero, the state is DS_ATTACHED. Looks like a weird combination. This is right after successfully setup interrupt.

iichid1 is that Google trackpad btw.

// hmm, why isn't the !sc->power_on check done before request_bus? request and immediately release looks silly..

wulf7 commented 4 years ago

Thanks for report and patch.

It is strange bug. I should look deeper in to bus code...

hmm, why isn't the !sc->power_on check done before request_bus? request and immediately release looks silly..

That is done because sc->power_on is protected by internal iicbus lock.

Could you change your patch with adding new check for iicbus state while keeping old one intact?

valpackett commented 4 years ago

Could you change your patch with adding new check for iicbus state while keeping old one intact?

Done

wulf7 commented 4 years ago

There is one question: Do you kldload iichid.ko before any udev/devd based device autodetector is run? Or at the time it is running? Is race between character device creation and device_attach() completion possible?

valpackett commented 4 years ago

I load iichid way after devd/devmatch/stuff, just manually in the console

wulf7 commented 4 years ago

Could you test following POC patch?

diff --git a/iichid.c b/iichid.c
index 565d4e0..9579edd 100644
--- a/iichid.c
+++ b/iichid.c
@@ -597,8 +597,14 @@ iichid_intr(void *context)
     * communicate with the DEVICE it MUST issue a SET POWER
     * command (to ON) before any other command.
     */
-   if (!sc->power_on) {
-       return;
+   device_t walk = parent;
+   while (walk != NULL) {
+       if (!device_is_attached(walk)) {
+           printf("INT fired before %s parent has been attached\n",
+               device_get_nameunit(walk));
+           return;
+       }
+       walk = device_get_parent(walk);
    }

    /*
valpackett commented 4 years ago

nah this doesn't print anything. (Not surprising, I already knew the failing devices were in DS_ATTACHED state from the print I used to replace the panic)

wulf7 commented 4 years ago

nah this doesn't print anything.

Does it still silently panices with this patch applied?

This patch tries to check race condition, so it is quite normal that it does not happen every time. I'll try describe race in details:

  1. On FreeBSD parent device very often switches to attached state only AFTER all the children devices have been attached.
  2. Child (iichid) device enables interrupts even before it complete attachment, so there is time window when interrupts are enabled but parents are not yet attached but residing in "attaching" state.
  3. Specs says that IIC HID device MUST generate interrupt at start to acknowledge the device reset. As interrupt handler do recursive device_busy() then reads 2 byte (really 3) over slow bus than recursive device_unbusy(), it is possible that parent device completes attachment during reading and changes it's state from 'attaching' to 'attached'. Latter can trigger the panic() as 'attaching' devices are not marked as 'busy' by device_busy() unlike 'attached' one.

It may or may not be the reason of the panic()

valpackett commented 4 years ago

I changed that panic to a print, but yeah, it silently prints as if it would panic.

wulf7 commented 4 years ago

Could you show vmstat -i output taken after several seconds from stock iichid kldloading. Please do not attach any evdev clients that time to not trigger POWER_ON command.

valpackett commented 4 years ago

No reboot (been using the desktop on this boot), instantly after loading:

irq16: ig4iic0+                    25158          1
irq18: ig4iic2                     36129          1
irq31: iichid0                   5492582        176
irq51: iichid1                   6924225        222

3s later:

irq16: ig4iic0+                    25158          1
irq18: ig4iic2                     36129          1
irq31: iichid0                   5700311        182
irq51: iichid1                   6924225        222

Fresh boot, instantly after loading:

irq16: ig4iic0+                     1417         32
irq18: ig4iic2                       651         15
irq31: iichid0                     10462        238
irq51: iichid1                      9966        227

3s later:

irq16: ig4iic0+                     1417         30
irq18: ig4iic2                       651         14
irq31: iichid0                    546435      11642
irq51: iichid1                    546021      11633

Something goes nuts when the devices are off but the driver is loaded for the first time :D and that's without any new interrupts on the ig4 controller

wulf7 commented 4 years ago

Try this patch

diff --git a/iichid.c b/iichid.c
index 14e4509..f1d910b 100644
--- a/iichid.c
+++ b/iichid.c
@@ -590,17 +590,6 @@ iichid_intr(void *context)
    uint16_t actual = 0;
    int error;

-   /*
-    * Ignore interrupts while in SLEEP power state. Reading of
-    * input reports of I2C devices residing in SLEEP state is not
-    * allowed and often returns a garbage. If a HOST needs to
-    * communicate with the DEVICE it MUST issue a SET POWER
-    * command (to ON) before any other command.
-    */
-   if (!sc->power_on) {
-       return;
-   }
-
    /*
     * Designware(IG4) driver-specific hack.
     * Requesting of an I2C bus with IIC_DONTWAIT parameter enables polled
@@ -608,11 +597,6 @@ iichid_intr(void *context)
     * interrupt handlers and callouts.
     */
    if (iicbus_request_bus(parent, sc->dev, IIC_DONTWAIT) == 0) {
-       /* Check again while locked by the request */
-       if (!sc->power_on) {
-           iicbus_release_bus(parent, sc->dev);
-           return;
-       }

        error = iichid_cmd_read(sc, sc->ibuf, sc->isize, &actual);
        iicbus_release_bus(parent, sc->dev);
@@ -626,8 +610,15 @@ iichid_intr(void *context)
            return;
        }

+       /*
+        * Ignore interrupts while in SLEEP power state. Reading of
+        * input reports of I2C devices residing in SLEEP state is not
+        * allowed and often returns a garbage. If a HOST needs to
+        * communicate with the DEVICE it MUST issue a SET POWER
+        * command (to ON) before any other command.
+        */
        mtx_lock(sc->intr_mtx);
-       if (sc->open)
+       if (sc->power_on && sc->open)
            sc->intr_handler(sc->intr_context, sc->ibuf, actual);
        mtx_unlock(sc->intr_mtx);
    } else
valpackett commented 4 years ago

Yep, looks like that works great. 1 event on load, (1 on touchscreen / 3 on touchpad) events after idling for 3-6 seconds. Nice!

wulf7 commented 4 years ago

Committed in slightly different version as 2d9b27c