larixer / hid-asus-dkms

ASUS HID FTE100* DKMS Driver
GNU General Public License v2.0
68 stars 10 forks source link

unusable touchpad with kernel param `idle=nomwait` #24

Closed ain101 closed 7 years ago

ain101 commented 7 years ago

When I use the kernel param idle=nomwait and reboot 2-3 times the touchpad gets into an unusable state. It makes no difference whether this dkms driver is used or not. 4.8.0-27-generic Kubuntu 16.10 (also works with live USB stick) Asus rog g501vw

http://pastebin.com/Dm2HDs7X

[   20.970241] i2c_hid i2c-FTE1001:00: failed to reset device.
[   27.114228] i2c_hid i2c-FTE1001:00: failed to reset device.
[   28.138218] i2c_hid i2c-FTE1001:00: can't add hid device: -61
[   28.162203] i2c_hid: probe of i2c-FTE1001:00 failed with error -61

The only way getting it working again is to remove idle=nomwait or shutdown completely. I don't see an obvious way how idle=nomwait can cause this. I think reseting the asus touchpad is quit fragile and some variations introduced by idle=nomwait break the procedure. This makes sense for me because the touchpad is working after a complete shutdown and reboot which would be a perfect reset. http://permalink.gmane.org/gmane.linux.kernel.commits.head/153381

redmcg commented 7 years ago

That's an interesting one. Can't say I understand why the nomwait would cause this issue - but there is definitely a problem with this device around the reset.

You might want to try my fork of this driver: https://github.com/redmcg/hid-asus-dkms

I've made a change to the i2c-hid module (which is included in my version of the dkms package) to do the reset and request the HID report descriptor in parallel. I tried this after stumbling across this document: https://msdn.microsoft.com/en-us/windows/hardware/drivers/hid/plug-and-play-support-and-power-management

The device seems to behave better with this.

My guess at the problem is that the device isn't raising its interrupt after completing the reset request - but then requesting the HID report descriptor seems to prompt it.

redmcg commented 7 years ago

@ain101 Did you get a chance to try the modified i2c-hid? I'm wondering if it made any difference

ain101 commented 7 years ago

I tried your fork. It doesn't seem to perform much differently than without. (all logicdata containing "red" is with your fork) I did more sniffing: https://github.com/ain101/drivers-input-touchscreen-FTS_driver/tree/master/doc/sniff/logic%20analyzer/nomwait%20problem

This picture shows the problem: https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/nomwait%20problem/3rd%20line%20problem.png

The thing in the middle is working. Notice the two channels on top. (this was a cold boot) This seems to be a mechanism for the touchpad to signal its master when it has new data or reseted successfully?

also interesting: https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/nomwait%20problem/linux%20boot%20error%20finger%20drag%20between%20failed%20to%20reset.logicdata While "i2c_hid i2c-FTE1001:00: failed to reset device." is performed several times I draged the finger 2 times across the pad. The data shows up but isn't interpreted by linux because the pad doesn't work like the driver expects. After the driver has given up with resetting no more data is sent for some reason? I also captured working initializations where it took 1-2 resetting attempts, but then the lines go low and the pad is working. https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/nomwait%20problem/linux%20reboot%20working%20removed%20nomwait.logicdata (reboot without nomwait)

ain101 commented 7 years ago

Maybe the long wires I attached for sniffing are messing with the pad. I will look at it with a scope.

redmcg commented 7 years ago

Channel 3 appears to be the interrupt - and it is used by the Touchpad (which is an i2c client at address 0x15) to tell the i2c host that it has data. This prompts the i2c host to:

  1. start the clock signal; and
  2. send the "read to 0x15" (you need to set addresses to 7 bits in the analyser to get the correct addresses displayed).

After the Touchpad sees the "read to 0x15" it will send its data.

To acknowledge a reset instruction (which is 0x05 0x00 0x00 0x01 in hex) the Touchpad should pull the interrupt and send two bytes of 0x00 0x00 (actually I can see in the working traces that it sends a lot more than two bytes of 0x00 - but more than two doesn't hurt).

The following screenshot compares a working reset vs a failed one: working vs not working

So you can see in the failed reset the interrupt is never pulled. It just stays high the whole time.

In my fork - I changed the i2c-hid module to send a request for a "report descriptor" (0x02 0x00 in hex) before even looking for the reset response - but I don't see this in your trace. It could be the DKMS module wasn't in the initrd and therefore the old one was still being used.

But I just made some changes to the i2c-hid module in my fork. In addition to fixing a couple of issues with it - I changed the reset error messages. This will help us be sure that it is the DKMS module being used.

So if you want to try it - do a git pull from my fork and reinstall with ./dkms-add.sh. I haven't had a chance to test it yet though - so if you try it and it causes any issues - you might need to go back one commit (with git reset --hard HEAD^).

Edit: I've tested now and it seems to be OK

redmcg commented 7 years ago

After the driver has given up with resetting no more data is sent for some reason?

This is because the driver instructs the Touchpad to go to sleep (0x05 0x00 0x01 0x08 in hex).

Maybe the long wires I attached for sniffing are messing with the pad. I will look at it with a scope.

Could be - but both Victor and I have seen the "failed to reset device." error message. I haven't had it completely fail yet (which happens after 3 retries). But I haven't seen that error since using my version of the i2c-hid module. My guess is that the request for the "report descriptor" prompts the Touchpad to pull its interrupt. It'll be interesting to see a trace as that will confirm.

larixer commented 7 years ago

@redmcg I would like to throw an idea, but it can be completely wrong or misleading. The thing is we didn't see the same behavior in Windows driver, i.e. that it tries to force interrupt on reset somehow. Because TouchPad is behind i2c_designware bus, maybe there is some problem in Linux designware driver?

redmcg commented 7 years ago

@vlasenko It's possible. The trace that @ain101 is taking is between the i2c controller and the TouchPad - so it's intra board (i.e. whilst waiting for the interrupt - there's no interaction with the CPU/drivers). But it could be that the i2c_designware driver is setting up the bus clock or voltages or is responsible for some other electronic interaction (like maybe it's suppose to pull another wire to wake the TouchPad up). But I don't see anything different happening in the Logicdata between a successful and failed interaction. The TouchPad is ACKing every byte of the reset request too - so it seems to have received it OK.

According to section 7.2.1.2 of the I2C HID spec, once the TouchPad receives the reset request, its job is to:

  1. reset its config;
  2. send two bytes of 0x00; and
  3. assert the interrupt

But we don't see this happening, so it seems the problem is with the TouchPad. I'm thinking it could be a firmware issue and that maybe sending the request for the report descriptor in parallel (like the Windows spec says it does) prompts the firmware to complete the reset activity.

We'll know better if @ain101 can reproduce the issue with my version of the i2c-hid driver (showing the report descriptor request [0x02 0x00] with still no interrupt).

ain101 commented 7 years ago

I can not reproduce the issue with the changed i2c-hid driver. I have come up with 4 solutions now: don't use kernel param idle=nomwait use new i2c-hid no rebooting do something unknown (I am currently not able to reproduce this bug on the ubuntu 16.10 which I mainly use on this laptop.)

New driver reads report descriptor without waiting for interrupt line. 56ms later the interupt line is pulled and zeros are sent. https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/nomwait%20problem/linux%20reboot%20redmcg.logicdata

But it could be that the i2c_designware driver is setting up the bus clock or voltages or is responsible for some other electronic interaction (like maybe it's suppose to pull another wire to wake the TouchPad up).

I think this is possible: page 11 (INT/E8/15) https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/FT5x46.pdf I am sniffing every connection between touchpad and motherboard except power. In normal operation this line clearly is used for signaling that new data is available. first trace: https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/nomwait%20problem/3rd%20line%20problem.png logicdata: https://github.com/ain101/drivers-input-touchscreen-FTS_driver/blob/master/doc/sniff/logic%20analyzer/linux%20multitouch%20swipe.logicdata

I should have looked at this pdf sooner. There is also an UART connected to a testpoint. Maybe Asus left some debugging strings. Neither Linux nor Windows seem to pull the RSTN line. It is alwayse high. When I pull RSTN the touchpad switches back to mouse mode.

redmcg commented 7 years ago

OK - I think a firmware problem is the best explanation for why the new i2c-hid fixes the issue - but Windows doesn't appear to do this (and that was the reason I made my change in the first place - because I thought it did!).

The screenshot below shows Windows working on top and Linux working (without the new i2c-hid) on the bottom.

image

The biggest difference that I can see is that Windows provides a 1ms pause between the wake-up and reset commands. Linux provides 35 µs. You can also see on Linux that the interrupt is pulled before the reset command is finished being sent - which shouldn't happen. It could just be a lucky coincidence that it worked on Linux under this circumstance.

So another possible issue is timing. But I can't see in the Logictrace any timing difference between using mwait and nomwait. And I also can't explain how the same Kernel version on Ubuntu vs. Kubuntu could make a difference. So there's still a bit of mystery there.

I'll work on another i2c-hid version which adds a 1ms pause before sending the reset and see if that fixes the issue on Linux as well.

redmcg commented 7 years ago

I've made an update to my fork. I reverted my previous changes to i2c-hid and added just a single line: usleep_range(750, 5000);

This means the i2c-hid driver will now sleep for between 750 and 5000 nanoseconds in-between sending the power on [0x05 0x00 0x00 0x08] and reset [0x05 0x00 0x00 0x01] commands.

I rebooted my system a handful of times to test it and it seems to be working.

I'd be curious to see if it fixes the issues with nomwait and Kubuntu too.

ain101 commented 7 years ago

I think more time between power on and reset is the solution for this issue. When I remove nomwait the delay between this packets also gets bigger a little bit. In the past I sniffed data where it took up to 3 reset tries. The last one had the biggest delay.

I think the device needs some time after power on and overheard our reset packed. (100us minium) i2c_hid spec speaks of 1 sek!

The DEVICE must ensure that it transitions to the HOST specified Power State in under 1 second. All HID DEVICES must support this command...

There should be some kind of response, but the spec says:

The DEVICE shall not respond back after receiving the command. The DEVICE is mandated to enter that power state imminently.

SET_POWER(ON) seems to be special. No 1 sec rule. Our Asus touchpad violates this.

Once the host issues a SET_POWER(ON) command, the DEVICE must transition to the ON state immediately...

One last mystery is that the interrupt is pulled before the reset command is finished. Something different must have triggered it.

redmcg commented 7 years ago

i2c_hid spec speaks of 1 sek!

Geez! I hadn't noticed that part before.

One last mystery is that the interrupt is pulled before the reset command is finished. Something different must have triggered it.

Yeah could be. Or another firmware issue.

When I remove nomwait the delay between this packets also gets bigger a little bit. In the past I sniffed data where it took up to 3 reset tries. The last one had the biggest delay.

It does sound more and more like a timing issue. Let me know how the new i2c-hid driver goes. If it fixes the problems with nomwait and Xubuntu - I might make raise it with the mailing list. We might need to add a new quirk to i2c-hid.

redmcg commented 7 years ago

The DEVICE must ensure that it transitions to the HOST specified Power State in under 1 second. All HID DEVICES must support this command...

There should be some kind of response

Agreed. Looks like the i2c hid protocol was designed by Microsoft though... :unamused:

redmcg commented 7 years ago

@ain101 Have you had any issues since I added the short sleep prior to the reset command? If it's working well I might close this issue and make another kernel submission.

ain101 commented 7 years ago

@redmcg The short sleep work for me. It should work for everyone. Issue #33 seems to prove me wrong though. As this issue covers my specific problem with kernel params I think it can be closed.

redmcg commented 7 years ago

Just for posterity - I thought I would note that I did find that there is a way for the i2c DEVICE to tell the HOST it is ready after receiving a POWER ON command. 7.2.8.4 of the i2c_hid specification states:

Once the host issues a SET_POWER(ON) command, the DEVICE must transition to the ON state immediately. Clock stretching can be employed (up to the maximum defined limit in the “Sizes and Constants” section) if needed.

So the DEVICE can pull (and hold) the CLOCK signal low until it is ready to receive its next command. The maximum time specified is 10ms. The disadvantage here is that the whole i2c bus is held up during this period (i.e. other devices would not be able to communicate).

But it looks like the Touchpad doesn't implement this and instead relies on the 1ms pause provided by the Windows driver.