linux-surface / surface-aggregator-module

Linux ACPI and Platform Drivers for Surface Devices using the Surface Aggregator Module over Surface Serial Hub (Surface Book 2, Surface Pro 2017, Surface Laptop, and Newer)
GNU General Public License v2.0
97 stars 11 forks source link

Battery stats only work intermittently on Surface Pro 2017 #19

Closed hpfr closed 4 years ago

hpfr commented 5 years ago

Hi,

I'm using jakeday's kernel via dmhacker's wrapper for Arch. I only get battery stats on some boots, and I can't seem to find any common traits between boot sequences that work and those that don't. If I reboot enough, I can always get it to work or stop working.

Here's some output from dmesg from a working boot:

[    2.017865] surfacegen5_acpi_san MSHW0091:00: Linked as a consumer to serial0-0
[    2.017870] acpi ACPI000E:00: Linked as a consumer to MSHW0091:00
[    2.017881] ac ACPI0003:00: Linked as a consumer to MSHW0091:00
[    2.017884] acpi PNP0C0A:00: Linked as a consumer to MSHW0091:00
...
[    4.061421] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[    4.162884] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[    4.263668] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
[    4.265382] usb 1-5: new high-speed USB device number 3 using xhci_hcd
[    4.364880] surfacegen5_acpi_ssh serial0-0: recv: invalid start of message
...
[   48.337779] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[   48.337791] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[   58.316891] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[   58.316904] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[   63.307488] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[   63.307500] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[   69.321411] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[   69.321423] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[   78.344983] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[   78.344993] surfacegen5_acpi_san MSHW0091:00: san_rqst: IO error occured, trying again
[  107.655866] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up

Those recv and rqst messages occur almost every boot, even the ones that work. They appear over the login tty and the rqst ones seem to continue appearing while I'm logged in in the background according to dmesg.

Here's a failed one:

[    5.476235] surfacegen5_acpi_ssh serial0-0: rqst: communication failed 3 times, giving up
[    5.818733] mwifiex_pcie 0000:01:00.0: info: FW download over, size 843828 bytes
[    5.908373] Console: switching to colour frame buffer device 342x114
[    5.937181] i915 0000:00:02.0: fb0: inteldrmfb frame buffer device
[    5.937398] surfacegen5_acpi_ssh: probe of serial0-0 failed with error -5

This failed one didn't have the recv and request messages, but I believe some have.

This doesn't seem like enough to determine the problem, but I'm not really sure what else to provide. Feel free to request more info.

Thanks.

Edit: Is your gist relevant to me? Why would the MSHW number be different (84 vs 91)? That's some way to interface with the hardware, right? I'm a little in over my head here, if you can't tell.

Edit 2: This has persisted through several kernel versions (4.18?, 4.19, now 5.0.7) and a full reinstall at some point over the past couple months.

hpfr commented 5 years ago

Apologies about the closing and reopening; my account was flagged by GitHub and I didn't think the actions would go through.

qzed commented 5 years ago

Hi, sorry but I can't see this issue on GitHub, I just receive notifications via mail from it. Since you mentioned that GitHub flagged your account, this kind of looks like a shadow-ban. Can you make sure your issue is visible for others (i.e. when you're not logged in) and if necessary create a new account?

As to your problem: Have you tried a two-button reset?

On 4/25/19 6:53 AM, Liam Hupfer wrote:

Apologies about the closing and reopening; my account was flagged by GitHub and I didn't think the actions would go through.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/qzed/linux-surfacegen5-acpi/issues/19#issuecomment-486519039, or mute the thread https://github.com/notifications/unsubscribe-auth/ADG7W7QU45AQ3T7WGBUYTBLPSE2LDANCNFSM4HHO24WQ.

hpfr commented 5 years ago

Alright, they appear to have finally rectified it after my account getting flagged a second time immediately after it was fixed once. Sorry about that again.

What do you mean by two-button reset? Something like REISUB? I did clean install at one point.

qzed commented 5 years ago

Glad you could get that resolved.

What do you mean by two-button reset?

Press and hold power and volume-up button (while powered on) until the device restarts. If you have a charger or dock connected, the led on that should turn off after a while and then back on (at that point you can also release the buttons and start the device normally). This completely resets the EC so we can make sure it's not some weird quirk.

As for your edits on your first comment:

  1. There are a few devices involved, MSHW0084 is the Serial-Hub, so basically the communication master. MSHW0091 is the ACPI Notify device, which is basically a bridge from the Serial-Hub to ACPI. The final error message is displayed on MSHW0091 because that's actually the communication endpoint (i.e. it communicates via MSHW0084 with the EC). The serial0-0 device is actually the MSHW0084 device, which in this case re-tries the communication 2 additional times to handle sporadic I/O failures.

  2. There haven't been many changes to the core since the first version. We've had a similar problem in the past, but this should be fixed since we switched to DMA. There are still sporadic I/O failures to be expected (mostly during boot or resume when the kernel is busy) due to the current DMA implementation for serial devices, which (if I remember correctly, may have changed) only uses one buffer instead of a rotating set. However, there should at most be two failures in succession (1 overflow due to full DMA buffer + 1 potential invalid start due to the emptied buffer arriving mid-transaction).

qzed commented 5 years ago

@hpfr Did the reset work?

qzed commented 4 years ago

Going to close this. a) there hasn't been any response to this for a while, b) I haven't heard of any other issues resembling this lately, and c) there have been quite a lot of changes in the SAM/SSH driver which should have made communication a bit more stable.

Feel free to comment if this issue still persists.

hpfr commented 3 years ago

Sorry for the radio silence. Like you said, I believe the changes to the driver have eliminated this for me. Thanks!

qzed commented 3 years ago

Thanks for the feedback!