linux-surface / surface-aggregator-module

Linux ACPI and Platform Drivers for Surface Devices using the Surface Aggregator Module over Surface Serial Hub (Surface Book 2, Surface Pro 2017, Surface Laptop, and Newer)
GNU General Public License v2.0
93 stars 11 forks source link

Looping "wake irq triggered" on Surface Studio 2 when volume button pressed #32

Open camilio69 opened 4 years ago

camilio69 commented 4 years ago

On the Surface Studio 2 (not officially supported I know), most things works. However, as soon as I press a volume button, the system goes unresponsive, with dmesg reporting repeatedly: surface_sam_ssh serial0-0: wake irq triggered

qzed commented 4 years ago

Can you upload an acpidump (run sudo acpidump > acpidump.out)?

camilio69 commented 4 years ago

Here it is. acpidump.txt

qzed commented 4 years ago

Okay, to be sure: If you unload the SAM modules the system doesn't get unresponsive (you can unload all via https://github.com/linux-surface/surface-aggregator-module/blob/master/scripts/unload.sh)?

I can't find any obvious connection in the acpidump/DSDT between volume buttons and the SSH wake GPIO pin, so this may take some work to track down.

Also those are the side-buttons for the volume?

camilio69 commented 4 years ago

OK, so when removing the modules as specified, there is not IRQ going crazy when I press button, so system keeps its responsiveness.

To clarify, when I said system went unresponsive, that was exaggerated. Some CPU were stuck in IRQ processing. After restart, with all default modules loaded, in /proc/interrupts, the ones which loop (10k/s) are: 14: 0 0 192731 0 0 0 0 0 IR-IO-APIC 14-fasteoi INT345D:00 135: 0 0 620288 0 0 0 0 0 INT345D:00 99 surface_sam_wakeup Other interrupts looks OK. If I stop the service and removes the module, IRQ 14 continues to trig only.

Yes, I speak of the hardware volume buttons on the side of the screen.

qzed commented 4 years ago

If you remove both the SAM modules and the soc_button_array, does IRQ 14 still trigger?

camilio69 commented 4 years ago

No, this stops it.

qzed commented 4 years ago

Does only removing soc_button_array stop it too? Also the volume switches do behave normally, so no constant/automatic increase/decrease after pressing?

camilio69 commented 4 years ago

Yes, removing only soc_button_array stops both IRQ. The volume switch never worked with this kernel + module (it does not change the volume). The mechanic is OK: it works as expected when booting Windows.

I realize I did not put all information here: I put Ubuntu 19.10 with the linux-surface/linux-surface from the package repositories (version 5.4.6-surface-1). I had to put nomodeset to be able to boot, or I had blank screen which seam frozen (divide error in nouveau driver at gf119_disp_super). I used to run Ubuntu 19.04 with jakeday/linux-surface on this same hardware, and the button part was working, but rest was less working (specially very long startup with timeouts).

If you need any dmesg or other information to help the further development, I will be glad to help. So, thanks to your help, I found that putting soc_button_array in blacklist makes the unit stable, even if I do not have the hardware button volume, which is OK for me.

qzed commented 4 years ago

It's good that you've found a workaround for now. I still have no real clue why this is happening and unfortunately not that much time to look into this at the moment.

A dmesg log would be nice (complete, before and after pressing a volume button), maybe we can spot something there, although I kind of doubt that. Do you still remember which kernel version of jakeday/linux-surface you were running, specifically on which version the volume buttons were working?

camilio69 commented 4 years ago

Unfortunately I do not have the PC anymore for now. Here is the dmesg when graphics are enabled, I had a copy on my computer. dmesg_nouveau.txt I will try to make a new one with pressing button when I get the PC back, but I think it will be hard as the IRQ triggers too fast and fills up the ring buffer. For the jakefay/linux-surface version, I will check, I let a partition to boot it. It was taken in late August/early September 2019.

Thanks,

qzed commented 4 years ago

Since the root problem seems to also occur when the SAM modules are unloaded you could unload them and then get the dmesg log, that way that part at least shouldn't be spammed to the log.

I'll try to have a look at the log later.