linux-surface / linux-surface

Linux Kernel for Surface Devices
4.71k stars 206 forks source link

SB3 support #201

Closed NickyTope closed 3 years ago

NickyTope commented 4 years ago

Tried installing both Arch and ubuntu in my new SB3 13.5 i7 32gb

In both cases uname reports the surface kernel however both the keyboard and trackpad do not function.

Is there anything I can do to help support this device?

Currently am back on windows but happy to install any distro again to test..

archseer commented 4 years ago

I confirm that the latest sb3-v2 commit works fine for me.

archseer commented 4 years ago

@qzed More data about TC 0x21, (we might want to move this to a new issue in the SAM repo):

It seems to have been previously used on Surface Laptop devices:

https://github.com/linux-surface/surface-aggregator-module/issues/4#issuecomment-473236117

You can also see it here in my logs back when I was figuring out how the touchpad works (grep for 80 21), you can see it being fired off when I re-enabled the touchpad under Device Manager: https://gist.github.com/archseer/786b1b712864e909bb6f6ed567473699#file-enable-touchpad-log-L648

I think we possibly also discussed it months ago but I can't find the relevant logs on Matrix.

NickyTope commented 4 years ago

I got this latest update to load however I can't get any debug output.. ran sudo make dkms-uninstall in the old directory and confirmed no more input or battery status then added +ccflags-y := -DDEBUG after the sources += x lines in the Makefile. I ran sudo ../scripts/ssam-modprobe -r then again with insmod (output) below, this makes the keyboard and battery info work again but no trackpad and no debug output, am I missing something?

ssam-modprobe-insmod-load.txt

qzed commented 4 years ago

You need to add the ccflags-y := -DDEBUG in the Kbuild file, not the Makefile.

qzed commented 4 years ago

@archseer Thanks for testing!

It seems to have been previously used on Surface Laptop devices:

linux-surface/surface-aggregator-module#4 (comment)

This is a different thing. That there is some USB control communication, not via SAM.

You can also see it here in my logs back when I was figuring out how the touchpad works (grep for 80 21), you can see it being fired off when I re-enabled the touchpad under Device Manager: https://gist.github.com/archseer/786b1b712864e909bb6f6ed567473699#file-enable-touchpad-log-L648

Ah yes, that looks exactly like the ones on the SB3. Can't really remember having discussed it, at least not extensively... probably forgot that. I should probably make a list of the features/stuff we should look at and collect the logs for them.

NickyTope commented 4 years ago

You need to add the ccflags-y := -DDEBUG in the Kbuild file, not the Makefile.

Thanks! that did the trick, no trackpad input registering hopefully this output helps..

ssam-modprobe-insmod-load.txt ssam-modprobe-insmod-trackpad-move.txt ssam-modprobe-insmod-kb-click.txt

NickyTope commented 4 years ago

theres is a big pause when running the insmod before it completes.

am running the sb3_test script now also

qzed commented 4 years ago

I forgot to filter the events by IIDs for each device, so each device got all reports... I'm not quite sure if that fixes the trackpad not working though...

If that doesn't work, can you remove every surface_sam_sid_vhf line here except for the one with .id = 3 (trackpad)? That should enable trackpad support only, so we'll see if there are some interferences.

NickyTope commented 4 years ago

yes that did enable the trackpad! do you want any of the dmesg output?

qzed commented 4 years ago

theres is a big pause when running the insmod before it completes.

That could come from (re-)starting services.

qzed commented 4 years ago

yes that did enable the trackpad! do you want any of the dmesg output?

Which one, the commit I pushed or the fallback change?

NickyTope commented 4 years ago

sorry my bad, I missed the commit and went straight to the fallback.

just pulled your change and can confirm both keyboard and trackpad are working!

qzed commented 4 years ago

No worries! Awesome! I guess the trackpad HID device might have gotten a bit confused by the messages that weren't meant for it then. With that, I think we can tick off keyboard/trackpad support.

@archseer can you test the latest commit again to check that the filtering doesn't cause any issues on the SL3? If there aren't any issues on the SL3, I think we can merge that into master and update the patches.

Apart from the touchscreen, what are the other things that don't work? You mentioned the DTX driver (also the device it usually probes against is gone from ACPI), anything else?

NickyTope commented 4 years ago

Just tested the touchscreen now and it is no longer way off, in fact it works quite well..

Looking at the wiki page that leaves:

Pen (connected on bt, no input) SOix (how do I test this?) Sensors - not sure what this refers to? Performance modes

NickyTope commented 4 years ago

getting this error (maybe im jumping the gun trying to install these?)

➜  module git:(feature/sb3-v2) sudo make dkms-install
mkdir -p /usr/src/"surface_sam"-"0.1"/
cp -t /usr/src/"surface_sam"-"0.1"/ Makefile Kbuild dkms.conf surface_sam_ssh.h surface_sam_ssh.c surface_sam_san.c surface_sam_san.h surface_sam_vhf.c surface_sam_dtx.c surface_sam_hps.c surface_sam_sid.c surface_sam_sid_gpelid.c surface_sam_sid_perfmode.c surface_sam_sid_vhf.c surface_sam_sid_power.c
dkms add "surface_sam"/"0.1"
Error! DKMS tree already contains: surface_sam-0.1
You cannot add the same module/version combo more than once.
make: *** [Makefile:33: dkms-install] Error 3
StollD commented 4 years ago

SOix (how do I test this?)

You need to fetch a few values from sysfs before and after suspending the device. This script can help with that: https://gist.github.com/StollD/139779ed6aa441f661f641d83a108632

Run the script, then suspend (10 seconds should be enough). After resuming, run the script again. Compared to the first run, the values of low_power_idle_cpu_residency_us, low_power_idle_system_residency_us and slp_s0_residency_usec should have increased. If thats the case, the device went into S0ix.

Pen (connected on bt, no input)

Unfortunately, getting the Pen to work on gen7 devices is not possible at the moment, because the IPTS interface changed and we don't know how to switch the device from singletouch (where the stylus is disabled) into multitouch mode. Because IPTS has almost no proper documentation, this will need a fair bit of reverse engineering from someone who owns a gen7 device.

qzed commented 4 years ago

For the rest:

getting this error (maybe im jumping the gun trying to install these?)

This looks like the previous dkms install didn't fully uninstall everything (thus it thinks that the module is still there and dkms doesn't want to overwrite it). You can run dkms status to check what's installed and what sources are present. I think with dkms uninstall (and some parameters) you can uninstall the package.

Unfortunately, getting the Pen to work on gen7 devices is not possible at the moment, because the IPTS interface changed and we don't know how to switch the device from singletouch (where the stylus is disabled) into multitouch mode.

@StollD do we have an open issue for that? If not we should probably open one over at https://github.com/linux-surface/intel-precise-touch.

NickyTope commented 4 years ago

I'm falling asleep at my desk now, will try the above tomorrow and report back!

You guys are awesome!!!

qzed commented 4 years ago

I've opened a new issue for the DTX driver: https://github.com/linux-surface/surface-aggregator-module/issues/39.

StollD commented 4 years ago

@StollD do we have an open issue for that? If not we should probably open one over at https://github.com/linux-surface/intel-precise-touch.

Good idea. Opened one with all the stuff that we figured out (and that I could remember): https://github.com/linux-surface/intel-precise-touch/issues/4

qzed commented 4 years ago

@NickyTope In addition to https://github.com/linux-surface/linux-surface/issues/201#issuecomment-636870913, can you also test (separately to that, with the latest driver loaded)

also

@archseer Would be awesome if you could test all that too.

I want to figure out right now how the event system behaves and an API around it should be implemented. On my SB2, for example, it doesn't matter if I specify priority/channel 0x01 or 0x02 (or any other value for that matter). I.e. if I enable an event with 0x02 and disable it with 0x02, it still gets disabled even though the parameters are different. On the SL3, it looks like that might be similar (as enabling the keyboard events also worked with 0x01).

NickyTope commented 4 years ago

will make some time to get all this done today. have moved from xubuntu now and am on manjaro with my bspwm config, very happy with the setup so far, all the above worked the same with the arch kernel and aur packages.

NickyTope commented 4 years ago

my apologies for this taking some time to get to...

here are my findings with the sb3 test script.

setup / teardown don't seem to have any effect.

enable / disable work (they enable and disable the keyboard events)

kbd-disable is ignored with both 0x01 and 0x02 (I had to remove the len(sys.argv) check to get this to run)

ssam-modprobe-r.txt ssam-modprobe-insmod.txt sb3-test-setup.txt sb3-test-teardown.txt sb3-test-enable.txt sb3-test-disable.txt sb3-test-enable-after-disable.txt sb3-test-enable-keypress.txt sb3-test-kbd-disable-0x01.txt sb3-test-kbd-disable-0x02.txt

NickyTope commented 4 years ago

here are the names of devices in /sys/bus/iio/devices:

 /sys/bus/iio/devices  cat iio:device0/name
accel_3d
 /sys/bus/iio/devices  cat iio:device1/name
gyro_3d
 /sys/bus/iio/devices  cat iio:device2/name
gravity
 /sys/bus/iio/devices  cat iio:device3/name
dev_rotation
 /sys/bus/iio/devices  cat trigger0/name 
accel_3d-dev0
 /sys/bus/iio/devices  cat trigger1/name
gyro_3d-dev1
 /sys/bus/iio/devices  cat trigger2/name
gravity-dev2
 /sys/bus/iio/devices  cat trigger3/name
dev_rotation-dev3
NickyTope commented 4 years ago

setting performance mode does indeed work, here is a debug log setting it to 4:

surface-performance-set-4.txt

NickyTope commented 4 years ago

the values in the S0ix script did not update, so I guess that is unsupported at the moment also, it does suspend and hibernate perfectly well.

 ~/sfc-debug  sudo ./s0ixdebug.sh   
Intel(R) Core(TM) i7-1065G7 CPU
5.6.15-arch1-1-surface

NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"

/sys/devices/virtual/dmi/id/product_name:Surface Book 3
/sys/devices/virtual/dmi/id/product_sku:Surface_Book_3_1900
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us:0
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us:0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C2 : 1627362071
/sys/kernel/debug/pmc_core/package_cstate_show:Package C3 : 562913523
/sys/kernel/debug/pmc_core/package_cstate_show:Package C6 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C7 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C8 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C9 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C10 : 0
/sys/kernel/debug/pmc_core/slp_s0_residency_usec:0
 ~/sfc-debug  sudo systemctl suspend
 ~/sfc-debug  sudo ./s0ixdebug.sh   
Intel(R) Core(TM) i7-1065G7 CPU
5.6.15-arch1-1-surface

NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"

/sys/devices/virtual/dmi/id/product_name:Surface Book 3
/sys/devices/virtual/dmi/id/product_sku:Surface_Book_3_1900
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us:0
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us:0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C2 : 1674555893
/sys/kernel/debug/pmc_core/package_cstate_show:Package C3 : 624935775
/sys/kernel/debug/pmc_core/package_cstate_show:Package C6 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C7 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C8 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C9 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C10 : 0
/sys/kernel/debug/pmc_core/slp_s0_residency_usec:0
 ~/sfc-debug  sudo systemctl hibernate
 ~/sfc-debug  sudo ./s0ixdebug.sh     
Intel(R) Core(TM) i7-1065G7 CPU
5.6.15-arch1-1-surface

NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"

/sys/devices/virtual/dmi/id/product_name:Surface Book 3
/sys/devices/virtual/dmi/id/product_sku:Surface_Book_3_1900
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us:0
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us:0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C2 : 4400582
/sys/kernel/debug/pmc_core/package_cstate_show:Package C3 : 1248120
/sys/kernel/debug/pmc_core/package_cstate_show:Package C6 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C7 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C8 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C9 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C10 : 0
/sys/kernel/debug/pmc_core/slp_s0_residency_usec:0
NickyTope commented 4 years ago

I need to also add that I cannot reboot, I did notice this in xubuntu but thought it was an unrelated crash.

reboot ing from arch will shutdown but the system freezes at the windows logo on startup and I have to hold the power button to reset it.

shutdown -h now works fine and the system comes back as normal.

I have refind installed now and can confirm that the freeze I saw previously in grub is not present in refind

NickyTope commented 4 years ago

oh, just noticed this in dmesg after resume from hibernation:

[ 2909.076559] PM: hibernation: hibernation exit
[ 2909.080710] audit: type=1130 audit(1591522597.507:469): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2909.224495] audit: type=1130 audit(1591522597.651:470): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hibernate comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2909.224498] audit: type=1131 audit(1591522597.651:471): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-hibernate comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2909.276248] audit: type=1130 audit(1591522597.704:472): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2909.277040] iwlwifi 0000:00:14.3: Applying debug destination EXTERNAL_DRAM
[ 2909.423207] iwlwifi 0000:00:14.3: FW already configured (0) - re-configuring
[ 2909.449992] iwlwifi 0000:00:14.3: Applying debug destination EXTERNAL_DRAM
[ 2909.597587] iwlwifi 0000:00:14.3: FW already configured (0) - re-configuring
[ 2910.582440] Bluetooth: hci0: Waiting for firmware download to complete
[ 2910.582538] Bluetooth: hci0: Firmware loaded in 1499979 usecs
[ 2910.582546] Bluetooth: hci0: Waiting for device to boot
[ 2910.596850] Bluetooth: hci0: Device booted in 13976 usecs
[ 2910.596856] Bluetooth: hci0: Found Intel DDC parameters: intel/ibt-19-32-4.ddc
[ 2910.598781] Bluetooth: hci0: Applying Intel DDC parameters completed
[ 2910.600637] Bluetooth: hci0: Firmware revision 0.0 build 212 week 13 2020
[ 2914.085407] audit: type=1131 audit(1591522602.516:473): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 2914.583397] audit: type=1101 audit(1591522603.016:474): pid=11940 uid=1000 auid=1000 ses=2 subj==unconfined msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="nt" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.583527] audit: type=1110 audit(1591522603.016:475): pid=11940 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.584892] audit: type=1105 audit(1591522603.016:476): pid=11940 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:session_open grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.623491] audit: type=1101 audit(1591522603.056:477): pid=11946 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.623727] audit: type=1110 audit(1591522603.056:478): pid=11946 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.627179] audit: type=1105 audit(1591522603.060:479): pid=11946 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:session_open grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.631601] audit: type=1106 audit(1591522603.063:480): pid=11946 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:session_close grantors=pam_limits,pam_unix,pam_permit acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.631708] audit: type=1104 audit(1591522603.063:481): pid=11946 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:setcred grantors=pam_unix,pam_permit,pam_env acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.639273] audit: type=1101 audit(1591522603.070:482): pid=11948 uid=0 auid=1000 ses=2 subj==unconfined msg='op=PAM:accounting grantors=pam_unix,pam_permit,pam_time acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=/dev/pts/0 res=success'
[ 2914.824854] iwlwifi 0000:00:14.3: Applying debug destination EXTERNAL_DRAM
[ 2914.971900] iwlwifi 0000:00:14.3: FW already configured (0) - re-configuring
[ 2918.165151] wlp0s20f3: authenticate with 60:38:e0:80:66:2c
[ 2918.168376] wlp0s20f3: send auth to 60:38:e0:80:66:2c (try 1/3)
[ 2918.191975] wlp0s20f3: 60:38:e0:80:66:2c denied authentication (status 1)
[ 2918.379189] wlp0s20f3: authenticate with 60:38:e0:80:66:2b
[ 2918.381865] wlp0s20f3: send auth to 60:38:e0:80:66:2b (try 1/3)
[ 2918.404439] wlp0s20f3: authenticated
[ 2918.407867] wlp0s20f3: associate with 60:38:e0:80:66:2b (try 1/3)
[ 2918.408996] wlp0s20f3: RX AssocResp from 60:38:e0:80:66:2b (capab=0x11 status=0 aid=6)
[ 2918.412326] wlp0s20f3: associated
[ 2918.423569] IPv6: ADDRCONF(NETDEV_CHANGE): wlp0s20f3: link becomes ready
[ 2919.565672] kauditd_printk_skb: 26 callbacks suppressed
[ 2919.565675] audit: type=1131 audit(1591522607.999:509): pid=1 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='unit=NetworkManager-dispatcher comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
[ 3014.892657] Uhhuh. NMI received for unknown reason 3c on CPU 0.
[ 3014.892657] Do you have a strange power saving mode enabled?
[ 3014.892657] Dazed and confused, but trying to continue
steveniemitz commented 4 years ago

:wave: just wanted to jump in here and say thank you for working on this. I just set up everything from the sb3-v2 branch and things are working really well so far! Let me know if there's anything I can do to help out as well. SB3 i7 here (15").

reboot ing from arch will shutdown but the system freezes at the windows logo on startup and I have to hold the power button to reset it.

Adding reboot=pci to my kernel cmdline fixed that.

qzed commented 4 years ago

my apologies for this taking some time to get to...

No worries!

here are my findings with the sb3 test script.

setup / teardown don't seem to have any effect.

I think this might only enable events for the REG subsystem (TC=0x21). There are some events from it directly after issuing the command, which looks the same as on the IRPMon log and as mentioned then could be the events currently enabled. Not sure if there are other events being sent. We could maybe try if we can get it to send us the list on command (maybe CID=3 since that's the CID for the event).

enable / disable work (they enable and disable the keyboard events)

kbd-disable is ignored with both 0x01 and 0x02 (I had to remove the len(sys.argv) check to get this to run)

Interesting. So we'll need to add support for the REG system and enable/disable the events via that. The kbd- commands not doing anything indicate that the old way to enable/disable events isn't being used here.

I've also had a look at the Windows driver, and it seems like the old and new event enable/disable commands have, as far as I can tell (unfortunately I'm not a 100% sure here as the CID/TC values seem to be supplied by the caller and not the registry function), mostly the same code path. So that would indicate that the old and new ways don't play together, but the new is a replacement for the old.

Another thing that I've noticed is the battery. I accidentally added them via IID (instance-ID) 0 and 1, but on the SB2 those were 1 and 2 (this should pretty much have the effect that there are two battery instances created, but they will display the exact same data, i.e. the data of battery 1, which should be the one integrated into the clipboard). I've fixed that now, however, after that I had another look at the battery messages in the logs, and it looks like this is not going to work: It seems like they separated the batteries not by IID but by priority (which I think we should rename at this point, because that use kind of indicates that it's not a priority but maybe rather something like a transfer channel). That is kind of confirmed by one of the REG events: 01 00 02 02 01 (the 02 01 at the end is likely TC=2, i.e. battery/power subsystem and IID=1). So likely, the current branch state will break the battery driver and the old state will show only battery 1, but two times... I'll try to fix that tomorrow.

Just to make sure that they haven't done some controller magic to virtually "merge" the two batteries into one for display purposes: The (extended) battery status on Windows does show two batteries?

So with regards to this, I want to try and find the answers for a couple of questions:

I'll open up a new issue for this, otherwise the individual problems might get a bit mixed up. I'll also try to update the testing script tomorrow so that you can do a bit more testing to help figure this out, if you're up for it.

qzed commented 4 years ago

here are the names of devices in /sys/bus/iio/devices:

 /sys/bus/iio/devices  cat iio:device0/name
accel_3d
 /sys/bus/iio/devices  cat iio:device1/name
gyro_3d
 /sys/bus/iio/devices  cat iio:device2/name
gravity
 /sys/bus/iio/devices  cat iio:device3/name
dev_rotation
 /sys/bus/iio/devices  cat trigger0/name 
accel_3d-dev0
 /sys/bus/iio/devices  cat trigger1/name
gyro_3d-dev1
 /sys/bus/iio/devices  cat trigger2/name
gravity-dev2
 /sys/bus/iio/devices  cat trigger3/name
dev_rotation-dev3

Looks like most of them are there. I believe the ambient light sensor is missing. That might be a custom one, probably the same or a similar one as on the SL3 (IIRC @archseer tried to figure that out on the SL3).

Edit: Opened an issue of the ALS: https://github.com/linux-surface/kernel/issues/53.

qzed commented 4 years ago

No clue about the S0ix, I honestly don't know that much about that topic. @kitakar5525 has a bit more insight into that, I believe.

The reboot thing is also an issue on the SL3 and SP7: https://github.com/linux-surface/linux-surface/issues/113. I've added a wiki entry for this: https://github.com/linux-surface/linux-surface/wiki/Surface-Book-3. No idea what it would take/how to fix that in-kernel.

NickyTope commented 4 years ago

Adding reboot=pci to my kernel cmdline fixed that.

welcome, and thanks this encouraged me to clean up my refind conf and get intel-ucode working! can confirm this also works for me.

kitakar5525 commented 4 years ago

the values in the S0ix script did not update, so I guess that is unsupported at the moment also, it does suspend and hibernate perfectly well.

 [...]
 ~/sfc-debug  sudo systemctl suspend
 ~/sfc-debug  sudo ./s0ixdebug.sh   
Intel(R) Core(TM) i7-1065G7 CPU
5.6.15-arch1-1-surface

NAME="Manjaro Linux"
PRETTY_NAME="Manjaro Linux"

/sys/devices/virtual/dmi/id/product_name:Surface Book 3
/sys/devices/virtual/dmi/id/product_sku:Surface_Book_3_1900
/sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us:0
/sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us:0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C2 : 1674555893
/sys/kernel/debug/pmc_core/package_cstate_show:Package C3 : 624935775
/sys/kernel/debug/pmc_core/package_cstate_show:Package C6 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C7 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C8 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C9 : 0
/sys/kernel/debug/pmc_core/package_cstate_show:Package C10 : 0
/sys/kernel/debug/pmc_core/slp_s0_residency_usec:0
[...]

Regarding S0ix, the SB3 didn't enter deeper state than Package C3 (pc3). Reaching pc10 is a prerequisite for achieving S0ix, but it didn't enter even pc10.

These blog posts from Intel are useful to diagnose S0ix:

Also I'm curious about the dmesg log that contains at least one suspend/resume cycle with additional debug output:

# additional debug output
echo 1 | sudo tee /sys/power/pm_debug_messages
# go into s2idle, 10 sec is usually enough
systemctl suspend

then get the dmesg log. I like logs with log levels, so the output of dmesg -x is helpful.

archseer commented 4 years ago

I also think it's useful to test S0ix after tuning with powertop / TLP. The default kernel configuration is more aimed towards desktops and servers rather than laptops.

steveniemitz commented 4 years ago

one thing I noticed is that my SB3 seems to run a lot hotter in linux than in windows, even at idle and the battery saver power mode (using surface performance set) it stays around ~50* C and is noticeably warm to the touch.

I'll experiment more and see if I can figure anything out there. @NickyTope have you noticed the same?

qzed commented 4 years ago

I've updated the battery driver. Can you guys check if it works correctly? Specifically:

qzed commented 4 years ago

I have also updated the module to use the same detachment system (DTX) driver as on the SB2. Based on the logs above, the interface shouldn't have changed. Can you check if that works in combination with https://github.com/linux-surface/surface-dtx-daemon/? Specifically, detaching should be faster (at least if it's the same on the SB2 where without this it takes a couple of seconds) and notifications should be displayed when you can detach the clipboard and after it has been re-attached.

steveniemitz commented 4 years ago

battery looks good!

~$ cat /sys/class/power_supply/BAT0/energy_full
22150000
~$ cat /sys/class/power_supply/BAT1/energy_full
61340000

Charging status is updated as well when I plug/unplug.

I'll let the battery drain a little and then compare it to what windows reports as well.

Detach is REALLY fast now (I think faster than windows) but when I reattach, the trackpad/keyboard are no longer recognized. ~I didn't see any notifications, let me check that the daemon was running at the time as well.~ (edit: I was missing the user dtx service, enabling/starting that gives me the notification now, but the keyboard and trackpad still don't work).

~EDIT: I think the change broke reboot/shutdown as well. After X exits, I see surface_sam_sid_battery.1: unhandled power event (cid = 0x53) repeated every 10 seconds.~

edit 2: It looks like the shutdown/reboot hang only happens after detaching and reattaching. I just tried again w/o detaching and rebooting worked fine.

qzed commented 4 years ago

Okay, could be that we have to enable the keyboard events again after re-attaching. Can you compile and run the module with debugging enabled (un-comment the ccflags-y + -DDEBUG' line in theKbuild` file) and post a log of you trying to type after re-attachment? That should confirm if the events are missing or if there's something else going wrong.

You can also try to enable the events by running the sb3_test.py script (I've just updated that) with the following parameters:

./scripts/sb3_test.py enable 0x02 0x15 0x00 0x01
./scripts/sb3_test.py enable 0x02 0x15 0x00 0x03

if that doesn't work, try running

./scripts/sb3_test.py enable 0x02 0x21 0x01 0x00

before the above commands. if the events have been disabled, those should enable them again.

Regarding shutdown: Does it shut down properly without detaching? Do you have a dGPU model? If so, and you have the drivers for that loaded, detaching could have caused the driver or something relying on it to crash, preventing a clean shutdown. In that case, can you try if you can shutdown when you have blacklisted every dGPU driver, i.e.

blacklist nouveau
blacklist lbm-nouveau
alias nouveau off
alias nouveau-lbm off

blacklist nvidia
alias nvidia off

It should also be enough to just unload them before detaching, if there aren't any programs blocking that (Gnome has the tendency to do so).

In any case (regardless of you having a dGPU or not), can you check if the dmesg log indicates any driver crashes or oddities?

The unhandled power event (cid = 0x53) should not cause/indicate a shutdown failure. It's simply a message from the EC stating that some battery/power-consumption values have changed (unfortunately we don't know yet what specific change this indicates, thus us not handling it and thus the warning). See https://github.com/linux-surface/linux-surface/issues/136 and https://github.com/linux-surface/surface-aggregator-module/issues/29 for details.

steveniemitz commented 4 years ago

You can also try to enable the events by running the sb3_test.py script (I've just updated that) with the following parameters:

This 1/2 worked. The keyboard came back 100%, and trackpad 50%. I lost multitouch gestures on the trackpad though (two finger right click, two finger scrolling, etc) I'll rebuild the module with debug logging on again as well in a second.

Regarding shutdown: Does it shut down properly without detaching?

Yep

Do you have a dGPU model?

Also yep.

If so, and you have the drivers for that loaded, detaching could have caused the driver or something relying on it to crash, preventing a clean shutdown

I think you're right here. I looked back at the logs from my past boot and saw a bunch of nouveau crashes. I'll clean up the log and attach it here.

Also, thanks again for all the work on this, its amazing so far!

dmesg output attached: dmesg-2.txt

qzed commented 4 years ago

This 1/2 worked. The keyboard came back 100%,

Neat!

and trackpad 50%. I lost multitouch gestures on the trackpad though (two finger right click, two finger scrolling, etc)

That actually makes perfect sense: To enable multitouch on the trackpad, a special feature report has to be sent. This is handled by the HID driver when the device is created/added the first time. Detaching the base will basically disconnect the device without it being removed in the driver, so attaching it won't really add it in the driver, so the HID subsystem won't send the special feature report to enable multitouch.

This means that we'll have to fully remove the input devices on detach and re-add them on attach.

I think you're right here. I looked back at the logs from my past boot and saw a bunch of nouveau crashes. I'll clean up the log and attach it here.

The same issues are also present on the SB2, so was to be expected. The log shows that the crashes occur on disconnect, so that's very likely what's causing them. And most crashes like these block shutdown. This is (next to power-savings) one reason I've disabled the dGPU for the SB2 by default. Normally, you should be able to unload the module before detaching, but Gnome for some reason keeps a finger on the module and doesn't let you, if you've started Gnome/GDM with the dGPU turned on (if you turn it on later, it works). Never found out how to avoid that and why Gnome does that. The SB3 seems a bit different on how dGPU on/off is handled, but I only had a quick glance so far. That's another thing for the future, for now I'd like to focus on the keyboard driver.

steveniemitz commented 4 years ago

That actually makes perfect sense: To enable multitouch on the trackpad, a special feature report has to be sent. This is handled by the HID driver when the device is created/added the first time. Detaching the base will basically disconnect the device without it being removed in the driver, so attaching it won't really add it in the driver, so the HID subsystem won't send the special feature report to enable multitouch.

This means that we'll have to fully remove the input devices on detach and re-add them on attach.

Oh that does make sense!

The SB3 seems a bit different on how dGPU on/off is handled, but I only had a quick glance so far. That's another thing for the future, for now I'd like to focus on the keyboard driver.

I was able to remove the nouveau driver after a couple tries actually, but I also am fine keeping it off for now, I don't have much of a use for the dGPU in linux right now anyways, and like you said, it's good for power savings.

steveniemitz commented 4 years ago

one thing I noticed is that even with drivers disabled, the dGPU is always on, which makes it get pretty hot. I spent some time last night poking around and found a few things. This is mostly just a brain dump:

qzed commented 4 years ago

Only had a quick look at it so far, so take this with a grain of salt.

  • It looks like the dGPU can be turned on/off via ACPI (HGOF, HGON), I see it in the SB2 dumps as well, maybe this can be used instead?

This looks (mostly) exactly like the DSM function part on the SB2. This is the code called by __shps_dgpu_dsm_set_power_unlocked. By itself, this method is enough to cut power to the dGPU, at least on the SB2, so I'd assume it'd do the same on the SB3. You could try comment out stuff in the driver so that it'll only call that function. NB: This is the only function that I initially used, the root-port power setting stuff all came later as an improvement.

The reason I also do some root port power settings are to properly update the PCIe core drivers (and via that userspace). The PCIe settings should actually work (that's not really that Surface specific) as long as the dGPU is the only thing connected to the root port (which it looks like). Only problem is to get the root port or dGPU PCIe addres. On that note

  • The ACPI entry(?) changed from RP5 to RP13. The GUID is the same it seems though.

GUID being the same should mean that the semantics of the DSM haven't changed. So the interface should be the same (apart from bugs...).

  • Now here's the confusing part. Once we get into the ACPI data, we end up here. However, I can't actually find the PCI0.RP13.MRGT method anywhere, which also is confirmed by the fact that it fails from the driver as well. In the SB2 ACPI dump it's defined here.

You could try to make an acpidump of your own and grep for it, in case the one here is somehow incomplete. But this could also be a bug in the DSDT. Maybe MS isn't using that part and hard-code it somewhere. If I have some time I'll try to dig into the Windows driver.

  • The san device is no longer there on the SB3, so its driver never loads, which means the hps driver fails here. I commented this out to just get further, it seems it's controlled via the ssh device now.

Yeah, The SAN device used to send notifications when the dGPU would power up (after resume). This is actually quite useful to synchronize the PCIe core and/or get it to turn off again if it's an unwanted power-up. No clue how that behaves on the SB3.

steveniemitz commented 4 years ago

thanks for the comments! I'll try out your suggestions and report back later today.

By itself, this method is enough to cut power to the dGPU, at least on the SB2, so I'd assume it'd do the same on the SB3.

This seems like a great place to start, then I can try adding back functionality incrementally as I figure out the missing parts.

You could try to make an acpidump of your own and grep for it

Yeah I checked in mine too and its missing. I was also going to check the windows driver and see what it's doing here. Very strange though, your theory of them hard coding it into the driver seems most likely.


Turning off the dGPU worked with the suggestion above. I'm going to try to figure out why the GPIOs aren't working now.

[Wed Jun 10 13:37:48 2020] genirq: Setting trigger mode 3 for irq 165 failed (intel_gpio_irq_type+0x0/0x130 [pinctrl_intel])
[Wed Jun 10 13:37:48 2020] surface_dgpu_hps MSHW0153:00: base irq failed: -1
qzed commented 4 years ago

I've opened some new issues in an attempt to clean up this discussion a bit and make it easier to follow specific parts. Here's a list of all relevant issues:

Would be great if we could move the specific discussions to those.

steveniemitz commented 4 years ago

awesome, that's a great idea, thank you!

qzed commented 4 years ago

I've described some stuff that I'd love for you guys to test over on https://github.com/linux-surface/surface-aggregator-module/issues/40#issuecomment-642976709 (also has a bit of explanation regarding SAM, which might or might not be interesting to you if you're confused about my ramblings here and in those issues).

I also think it'd be useful to identify the specific keyboard/touchpad devices that are used: https://github.com/linux-surface/surface-aggregator-module/issues/42#issuecomment-642880467.

NickyTope commented 4 years ago

Thanks again @qzed I'll make time to get to all of this over the weekend