linux-surface / linux-surface

Linux Kernel for Surface Devices
4.72k stars 206 forks source link

[SB] SAM Support (dGPU-detection, clipboard-detach handling) #93

Open sadnub opened 4 years ago

sadnub commented 4 years ago

Hello, I'm just curious is DGPU support on SB1 is on the road map?

Excellent work on this!

qzed commented 3 years ago

You need to install the DKMS version of the acpi-call package. Otherwise the module won't get installed for custom kernels (which includes the surface kernel).

M4ST3R0FD1S4ST3R commented 3 years ago

I recently downloaded the newest kernel and the ACPI-Call package and was testing the GPU and noticed that no programs would utilize the GPU even when Performance-Mode was set in the Nvidia settings with the only application to use it being KdenLive which has it's own settings for selecting the GPU. Do you have a timeline for when performance modes are to be added?

qzed commented 3 years ago

This is not an issue with performance modes. You'll explicitly have to run the application on the dGPU, e.g. via optirun or primusrun. Here's an explanation for how that works on the SB2, you'll have to change replace the surface dgpu commands with the appropriate ACPI calls. Performance modes are only related to power and thermal limits and/or cooling.

jrevillard commented 3 years ago

I think you're missing at least CONFIG_PCIEPORTBUS and CONFIG_HOTPLUG_PCI_PCIE (both should be set to y). The dGPU (at least on the SB2) is connected via a PCIe root port, which basically functions as some sort of PCIe hot-pluggable slot.

It works @qzed thanks !

[root:/home/jerome] 1 # lspci
...
01:00.0 3D controller: NVIDIA Corporation GM108M [GeForce 940MX] (rev a2)
...

But still, the nvidia module do not want to load (nvidia-modprobe):

kern  :info  : [  330.292801] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present
kern  :info  : [  330.441202] pci 0000:01:00.0: [10de:134b] type 00 class 0x030200
kern  :info  : [  330.441240] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
kern  :info  : [  330.441254] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
kern  :info  : [  330.441269] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
kern  :info  : [  330.441292] pci 0000:01:00.0: reg 0x24: [io  0x0000-0x007f]
kern  :info  : [  330.441299] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
kern  :info  : [  330.441344] pci 0000:01:00.0: Enabling HDA controller
kern  :info  : [  330.441451] pci 0000:01:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:1c.0 (capable of 31.504 Gb/s with 8.0 GT/s PCIe x4 link)
kern  :info  : [  330.451228] pci 0000:01:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref]
kern  :info  : [  330.451239] pci 0000:01:00.0: BAR 3: assigned [mem 0xa2000000-0xa3ffffff 64bit pref]
kern  :info  : [  330.451248] pci 0000:01:00.0: BAR 0: assigned [mem 0xba000000-0xbaffffff]
kern  :info  : [  330.451252] pci 0000:01:00.0: BAR 6: assigned [mem 0xb9700000-0xb977ffff pref]
kern  :info  : [  330.451254] pci 0000:01:00.0: BAR 5: assigned [io  0x4000-0x407f]
kern  :info  : [  330.451259] pcieport 0000:00:1c.0: PCI bridge to [bus 01]
kern  :info  : [  330.451261] pcieport 0000:00:1c.0:   bridge window [io  0x4000-0x7fff]
kern  :info  : [  330.451265] pcieport 0000:00:1c.0:   bridge window [mem 0xb9700000-0xd16fffff]
kern  :info  : [  330.451268] pcieport 0000:00:1c.0:   bridge window [mem 0xa1400000-0xb93fffff 64bit pref]
kern  :warn  : [  330.490351] nvidia: module license 'NVIDIA' taints kernel.
kern  :warn  : [  330.490353] Disabling lock debugging due to kernel taint
kern  :info  : [  330.514438] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :info  : [  330.515199] nvidia 0000:01:00.0: enabling device (0000 -> 0003)
kern  :warn  : [  330.515367] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.515368] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.515369] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.515369] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  330.515379] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  330.515411] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  330.515412] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  330.515607] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  330.595527] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :warn  : [  330.597130] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.597131] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.597131] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.597132] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  330.597136] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  330.597153] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  330.597153] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  330.597356] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  330.731884] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :warn  : [  330.733189] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.733192] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.733194] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.733196] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  330.733207] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  330.733261] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  330.733263] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  330.733662] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  330.855387] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :warn  : [  330.858187] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.858189] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.858190] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.858191] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  330.858197] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  330.858240] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  330.858242] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  330.864746] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  330.967884] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :warn  : [  330.968539] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.968540] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.968541] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.968542] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  330.968546] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  330.968574] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  330.968575] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  330.968730] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  331.065649] nvidia-nvlink: Nvlink Core is being initialized, major device number 235
kern  :warn  : [  331.066285] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  331.066287] NVRM: Please check your BIOS settings.
kern  :warn  : [  331.066288] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  331.066289] NVRM: [Assign IRQ to VGA] should be set to YES 
kern  :warn  : [  331.066295] nvidia: probe of 0000:01:00.0 failed with error -1
kern  :warn  : [  331.066324] NVRM: The NVIDIA probe routine failed for 1 device(s).
kern  :warn  : [  331.066325] NVRM: None of the NVIDIA devices were initialized.
kern  :info  : [  331.066492] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
kern  :info  : [  371.492915] bbswitch: version 0.8
kern  :info  : [  371.492924] bbswitch: Found integrated VGA device 0000:00:02.0: \_SB_.PCI0.GFX0
kern  :info  : [  371.492937] bbswitch: Found discrete VGA device 0000:01:00.0: \_SB_.PCI0.RP05.PXSX
kern  :warn  : [  371.492952] ACPI Warning: \_SB.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59)
kern  :warn  : [  371.493187] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59)
kern  :err   : [  371.493236] bbswitch: No suitable _DSM call found.

Best

fematarazzo commented 3 years ago

So I've tried with a fresh install of Fedora and still haven't succeeded.

  1. Installed Fedora
  2. Updated everything
  3. Installed acpi_call using the following repo:
    sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
    sudo dnf install https://repo.linrunner.de/fedora/tlp/repos/releases/tlp-release.fc$(rpm -E %fedora).noarch.rpm
    sudo dnf install kernel-surface-devel akmod-acpi_call

    that @StollD pointed out.

  4. Installed DKMS: sudo dnf install dkms
  5. Installed the surface kernel and rebooted
  6. Install nvidia-modprobe: sudo dng install nvidia-modprobe
  7. Tried to run the echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call and still having the "file/directory not found" message

What am I missing?

jrevillard commented 3 years ago

@toastyfe :

5.1 sudo modprobe acpi_call

qzed commented 3 years ago

@jrevillard So we have some progress at least. Card gets detected properly, but it kind of looks like there are still some things missing:

kern  :warn  : [  330.515367] NVRM: Can't find an IRQ for your NVIDIA card!
kern  :warn  : [  330.515368] NVRM: Please check your BIOS settings.
kern  :warn  : [  330.515369] NVRM: [Plug & Play OS] should be set to NO
kern  :warn  : [  330.515369] NVRM: [Assign IRQ to VGA] should be set to YES 

I've diffed the config you've linked above against the Arch config (red is not present/removed in your config):

-ACPI_APEI_PCIEAER y
-HOTPLUG_PCI_PCIE y
-PCIEAER y
-PCIEAER_INJECT n
-PCIE_BW n
-PCIE_DPC y
-PCIE_ECRC y
-PCIE_EDR y
-PCIE_PME y
-RAPIDIO_TSI721 m
-USER_NS_UNPRIVILEGED y
 CRC16 m -> y
 CRYPTO_CBC m -> y
 CRYPTO_CRC32C m -> y
 CRYPTO_CTS m -> y
 CRYPTO_ECB m -> y
 CRYPTO_USER_API m -> y
 CRYPTO_USER_API_HASH m -> y
 CRYPTO_XTS m -> y
 DEBUG_INFO_BTF y -> n
 EXT4_FS m -> y
 FS_ENCRYPTION_ALGS m -> y
 FS_MBCACHE m -> y
 GCC_VERSION 100100 -> 90300
 HID m -> y
 HID_GENERIC m -> y
 I2C_NVIDIA_GPU m -> n
 INTEL_ATOMISP2_PM m -> y
 JBD2 m -> y
 KEYBOARD_ATKBD m -> y
 LD_VERSION 234000000 -> 233010000
 PCIEPORTBUS y -> n
 SERIO m -> y
 SERIO_I8042 m -> y
 SERIO_LIBPS2 m -> y
 UHID m -> y
+GENTOO_LINUX y
+GENTOO_LINUX_INIT_SCRIPT y
+GENTOO_LINUX_INIT_SYSTEMD y
+GENTOO_LINUX_PORTAGE y
+GENTOO_LINUX_UDEV y
+SURFACE_SAM m
+SURFACE_SAM_DEBUGFS m
+SURFACE_SAM_DTX m
+SURFACE_SAM_HPS m
+SURFACE_SAM_SAN m
+SURFACE_SAM_SID m
+SURFACE_SAM_SID_GPELID m
+SURFACE_SAM_SID_PERFMODE m
+SURFACE_SAM_SID_POWER m
+SURFACE_SAM_SID_VHF m
+SURFACE_SAM_SSH m
+SURFACE_SAM_SSH_ERROR_INJECTION n
+SURFACE_SAM_VHF m
+TIMER_OF y
+TIMER_PROBE y
+TOUCHSCREEN_IPTS m

No clue what specific option is missing, but you could try enabling

PCIE_DPC y
PCIE_ECRC y
PCIE_EDR y
PCIE_PME y

Those would by my guess. Or maybe it wants PCIe error reporting (PCIEAER/ACPI_APEI_PCIEAER).

fematarazzo commented 3 years ago

@jrevillard I get the following message: [felipe@localhost ~]$ sudo modprobe acpi_call modprobe: FATAL: Module acpi_call not found in directory /lib/modules/5.7.11-1.surface.fc32.x86_64

Even though the acpi_call is installed: akmod-acpi_call-1.1.2-2.fc32.x86_64 is already installed.

qzed commented 3 years ago

Hmm, I have no clue how the whole akmod stuff works (only ever dealt with dkms), but in general modules have to be built for each kernel individually. So it kind of looks like you have the package installed, but the module hasn't been built for the surface kernel (yet). My guess is that you somehow have to convince the akmod system to build the module for this kernel.

There does seem to be an akmods command that could help you troubleshoot that.

fematarazzo commented 3 years ago

I ran the akmods command and it showed up this message:

[felipe@localhost ~]$ sudo akmods Checking kmods exist for 5.7.11-1.surface.fc32.x86_64 [ OK ] Building and installing acpi_call-kmod [FAILED] Building rpms failed; see /var/cache/akmods/acpi_call/1.1.2-2-for-5.7.11-1.surface.fc32.x86_64.failed.log for details

Hint: Some kmods were ignored or failed to build or install. You can try to rebuild and install them by by calling '/usr/sbin/akmods --force' as root.

It surely tried to properly install the acpi_call-kmod module but failed for some reason due to the custom kernel.

So I rebooted back to the stock kernel and ran the same commands and it installed acpi_call-kmod. I ran sudo modprobe acpi_call as @StollD pointed out and it return nothing, so I supposed it worked. Then finally I ran echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call and it worked as well!

Than I rebooted back to the surface kernel and it showed the message above saying that it failed to install the acpi_call-kmod

qzed commented 3 years ago

Hmm, can you post the log file it mentions?

fematarazzo commented 3 years ago

@qzed there you go: https://gist.github.com/toastyfe/0139277fa8e5919102dad8e1087dc777

qzed commented 3 years ago

That looks like the akmods tool, not the logfile (/var/cache/akmods/acpi_call/1.1.2-2-for-5.7.11-1.surface.fc32.x86_64.failed.log)

fematarazzo commented 3 years ago

Oops, sorry. There you go: https://gist.github.com/toastyfe/36d777962a168ebe3a1daf5d98481492 There are some parts in Portuguese, I hope it doesn't interfere

qzed commented 3 years ago

It kind of looks like you have to install the kernel-surface-devel package, if I read these lines correctly:

2020/08/03 13:51:02 akmodsbuild: erro: Falha ao construir dependências:
2020/08/03 13:51:02 akmodsbuild: kernel-devel-uname-r = 5.7.11-1.surface.fc32.x86_64 é requerido por acpi_call-kmod-1.1.2-2.fc32.x86_64
fematarazzo commented 3 years ago

That's weird, because I've check with sudo dnf install kernel-surface-devel and the packages are already installed.

I guess I'll have to go back to Ubuntu...

qzed commented 3 years ago

Hmm yeah, that's odd then.

fematarazzo commented 3 years ago

Here I am back with the fresh Ubuntu install. I've updated everything, installed dkms and acpi-call with the stock kernel. Then I got the custom kernel and installed everything. Rebooted and tried to run the commands. Still having problems. That's whats on my terminal (with some evidences):

felipe@felipe:~$ uname -a Linux felipe 5.7.11-surface #1 SMP Thu Jul 30 19:07:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux felipe@felipe:~$ sudo apt-get install acpi-call Reading package lists... Done Building dependency tree
Reading state information... Done Note, selecting 'acpi-call-dkms' instead of 'acpi-call' acpi-call-dkms is already the newest version (1.1.0-5). The following packages were automatically installed and are no longer required: linux-headers-5.4.0-26 linux-headers-5.4.0-26-generic linux-image-5.4.0-26-generic linux-modules-5.4.0-26-generic linux-modules-extra-5.4.0-26-generic Use 'sudo apt autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. felipe@felipe:~$ sudo apt-get install acpi-call-dkms Reading package lists... Done Building dependency tree
Reading state information... Done acpi-call-dkms is already the newest version (1.1.0-5). The following packages were automatically installed and are no longer required: linux-headers-5.4.0-26 linux-headers-5.4.0-26-generic linux-image-5.4.0-26-generic linux-modules-5.4.0-26-generic linux-modules-extra-5.4.0-26-generic Use 'sudo apt autoremove' to remove them. 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. felipe@felipe:~$ sudo modprobe acpi-call modprobe: FATAL: Module acpi-call not found in directory /lib/modules/5.7.11-surface felipe@felipe:~$ sudo modprobe acpi-call-dkms modprobe: FATAL: Module acpi-call-dkms not found in directory /lib/modules/5.7.11-surface felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call tee: /proc/acpi/call: No such file or directory _SB.PCI0.RP05.HGON

So even though I have installed the acpi-call module (and dkms), for some reason, the modprobe acpi_call can't recognize it and doesn't let me run the next command.

qzed commented 3 years ago

Can you run sudo dkms status? And you did reboot after installing the kernel, right?

fematarazzo commented 3 years ago

Yep, I did reboot after installing everything.

felipe@felipe:~$ uname -a Linux felipe 5.7.11-surface #1 SMP Thu Jul 30 19:07:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux felipe@felipe:~$ sudo dkms status acpi-call, 1.1.0, 5.4.0-42-generic, x86_64: installed

qzed commented 3 years ago

Okay, that only seems to list the acpi-call module for the generic kernel. I think if something failed during installation, it should normally show that there. Did you install the acpi-call package when you were running the surface kernel? If not, you could try that.

Alternatively, I think sudo dkms install acpi-call/1.1.0 -k <surface-kernel-version> should work.

fematarazzo commented 3 years ago

I've installed it before running the surface kernel. So, just in case, I ran it again as you posted and it showed this:

felipe@felipe:~$ sudo dkms install acpi-call/1.1.0 -k 5.7.11-surface

Kernel preparation unnecessary for this kernel. Skipping...

Building module: cleaning build area... make -j4 KERNELRELEASE=5.7.11-surface -C /lib/modules/5.7.11-surface/build M=/var/lib/dkms/acpi-call/1.1.0/build...(bad exit status: 2) ERROR (dkms apport): kernel package linux-headers-5.7.11-surface is not supported Error! Bad return status for module build on kernel: 5.7.11-surface (x86_64) Consult /var/lib/dkms/acpi-call/1.1.0/build/make.log for more information.

And here is the log from the file: https://gist.github.com/toastyfe/d41ea2743db0136b2af8694679b035ea

qzed commented 3 years ago

Looks like the module version is incompatible with the kernel version (as Ubuntu uses v5.4 as kernel, they'll probably only try to keep their dkms modules compatible with that). You can try going with the LTS kernel, I think that might then hopefully be compatible with the module.

qzed commented 3 years ago

I'll probably need to look into creating a simple kernel module for our kernel that does those acpi-calls and exports a sysfs attribute or so... that whole module incompatibility thing seems to be a bit of a mess...

fematarazzo commented 3 years ago

I've rebooted back to the LTS Kernel and ran it. It worked. But as last friday, even after calling th HCON and looking at lspci it doesn't show the dgpu:

felipe@felipe:~$ uname -a Linux felipe 5.4.0-42-generic #46-Ubuntu SMP Fri Jul 10 00:24:02 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux felipe@felipe:~$ sudo modprobe acpi-call felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call _SB.PCI0.RP05.HGON felipe@felipe:~$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07) 00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21) 00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless

:(

qzed commented 3 years ago

I meant Surface LTS kernel (the v4.19 one). According to your uname, you seem to be running the 5.4-generic one (provided by Ubuntu and doesn't have the patch). You might need to explicitly the kernel in your bootloader.

fematarazzo commented 3 years ago

Just tried with the Surface LTS kernel (v4.19) and despite being able to run modprobe acpi-call and echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call I still wasn't able to make the nvidia show up with lspci

felipe@felipe:~$ uname -a Linux felipe 4.19.135-surface-lts #1 SMP Thu Jul 30 19:08:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux felipe@felipe:~$ sudo modprobe acpi-call felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call _SB.PCI0.RP05.HGON felipe@felipe:~$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07) 00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21) 00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless

qzed commented 3 years ago

Okay, so that means that at least, the acpi-call module does work now. Is there anything in the dmesg log after running the acpi-call?

StollD commented 3 years ago

echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call

Remember, it needs to be \_SB.PCI0.RP05.HGON. The path has been posted a few times without marking it as code, so the markdown editor escaped away the leading backslash.


Regarding the akmods on Fedora: They basically ship the rpm package scripts for the kernel module, use that to locally build a package for the current kernel, and install that. Our kernel-surface-devel package doesn't provide the same aliases as the stock kernel-devel package, so it doesn't work.

I'll look into fixing that when I have some time.

qzed commented 3 years ago

echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call

Remember, it needs to be \_SB.PCI0.RP05.HGON. The path has been posted a few times without marking it as code, so the markdown editor escaped away the leading backslash.

Oh right, I was totally blind and missed that... You should ensure that it's actually \ at the start, so you might have to enter "\\__SB.PCI0.RP05.HGON" on the command line.

fematarazzo commented 3 years ago

@StollD thanks for remembering the \ details. Unfortunately, I tried it again but haven't succeeded yet. Still, thank you for your help regarding the Fedora compatibility. I'm looking forward to using it again.

@qzed here is the lspci once again. I noticed that after the HCON call with \ it took a little bit longer, probably searching for the dgpu. But still, the same message:

felipe@felipe:~$ uname -a Linux felipe 4.19.135-surface-lts #1 SMP Thu Jul 30 19:08:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call _SB.PCI0.RP05.HGON felipe@felipe:~$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07) 00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21) 00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless

qzed commented 3 years ago

What's in the dmesg log after the acpi-call? If it'd work it should show something like the first part in https://github.com/linux-surface/linux-surface/issues/93#issuecomment-667848613.

fematarazzo commented 3 years ago

I've search for "nvidia" but couldn't find anything. There you go: https://gist.github.com/toastyfe/8c8e53f1c67dd6575adefa32aaab289e

qzed commented 3 years ago

It kind of looks like it can't execute the call: https://gist.github.com/toastyfe/8c8e53f1c67dd6575adefa32aaab289e#file-dmesg-L982. I'll try to track down where that error comes from. What does sudo cat /proc/acpi/call print if you run it after running the acpi-call.

fematarazzo commented 3 years ago

It gives me this:

felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call [sudo] password for felipe: _SB.PCI0.RP05.HGON felipe@felipe:~$ sudo cat /proc/acpi/call 0x0calledfelipe@felipe:~$

qzed commented 3 years ago

If that source is up-to-date, the error seems to come from https://github.com/mkottman/acpi_call/blob/master/acpi_call.c#L112. I think that'd mean that the path you provided is wrong. Can you check that the path is correct, with leading \. So it should be either "\_SB.PCI0.RP05.HGON" (with quotes) or if that doesn't work I think your shell may interpret the \ as escape sequence and you'll need "\\_SB.PCI0.RP05.HGON"

Also can you put your output in code tags (wrap it with ``` code ```), I think that would help us diagnose the path easier.

kitakar5525 commented 3 years ago

Hmm. Also, can you run this command:

sudo acpiexec -b "Methods" /sys/firmware/acpi/tables/DSDT | grep "HGON"

The command gives the following output on my SB1:

❯ sudo acpiexec -b "Methods" /sys/firmware/acpi/tables/DSDT | grep "HGON"
Input file /sys/firmware/acpi/tables/DSDT, Length 0x1965D (104029) bytes
             \_SB.PCI0.RP05.HGON Method       0x5632582a2120 001 Args 0 Len 01D6 Aml 0x5632581b081a
kitakar5525 commented 3 years ago

Also, your firmware version is old:

[    0.000000] DMI: Microsoft Corporation Surface Book/Surface Book, BIOS 91.2327.769 08/23/2018

It might be possible that arguments the method takes have changed? (The current latest firmware takes zero argument) Can you attach your acpidump here? (Instructions on how to get acpidump can be found on this wiki page)

fematarazzo commented 3 years ago

There you go, guys. I tried with \ and \ and here is the output. I also tried the command you posted, @kitakar5525 . BTW, here is my acpidump.out file: https://github.com/linux-surface/acpidumps/issues/9

felipe@felipe:~$ uname -a
Linux felipe 4.19.135-surface-lts #1 SMP Thu Jul 30 19:08:04 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
felipe@felipe:~$ sudo modprobe acpi_call
felipe@felipe:~$ echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
felipe@felipe:~$ echo "\\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
felipe@felipe:~$ sudo acpiexec -b "Methods" /sys/firmware/acpi/tables/DSDT | grep "HGON"
Input file /sys/firmware/acpi/tables/DSDT, Length 0x1856C (99692) bytes
             \_SB.PCI0.RP05.HGON Method       0x560e9accbd90 01 Args 0 Len 01D6 Aml 0x560e9abeaebd
felipe@felipe:~$ lspci
00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08)
00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07)
00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01)
00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model
00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21)
00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21)
00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01)
00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21)
00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21)
00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21)
00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21)
00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21)
00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21)
00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1)
00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1)
00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1)
00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21)
00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21)
00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21)
02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01)
03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless
kitakar5525 commented 3 years ago

Thanks. Regarding the acpidump, I don't see much difference... The only difference in DSDT is Audio stuff. (diff is available at https://github.com/linux-surface/acpidumps/issues/9)

I'll probably need to look into creating a simple kernel module for our kernel that does those acpi-calls and exports a sysfs attribute or so... that whole module incompatibility thing seems to be a bit of a mess...

Writing a kernel driver for just calling the HGON/HGOF may be a good idea

qzed commented 3 years ago

@toastyfe Can you upload a dmesg log after those two calls again (that's actually a bit more important than the lspci)?

qzed commented 3 years ago

@kitakar5525 Can you test the sb1_dgpu_sw module I pushed to https://github.com/qzed/linux-surface-sam-over-hid/tree/master/sb1_dgpu_sw? Basically run make and insmod, then you should be able to change the dGPU power state via

echo 1 | sudo tee /sys/bus/platform/devices/MSHW0041:00/dgpu_power

This should basically call HGON. Doing the same with 0 instead of 1 should call HGOF. I'll implement the _DSM call next.

Edit: Just implemented the _DSM call. Running

echo 1 | sudo tee /sys/bus/platform/devices/MSHW0041:00/dgpu_dsmcall

will execute that and (as far as I can tell from the DSDT) also turn on the dGPU. As there is no "off" via _DSM, you won't be able to run that call with 0.

kitakar5525 commented 3 years ago

@qzed Thanks! I tested on v4.19 and v5.8 with Arch and confirmed that it works as expected.

First, I needed to unload surface_sam_sid module. It seems that the device is already used by the driver:

$ dmesg -xw
#
# $ sudo insmod sb1_dgpu_sw.ko
kern  :err   : [72203.744601] Error: Driver 'surface_sam_sid' is already registered, aborting...
# $ sudo modprobe -r surface_sam_sid
# $ sudo insmod sb1_dgpu_sw.ko

This is a log when calling 1 > dgpu_power and dGPU successfully turned on:

$ dmesg -xw
#
# $ echo 1 | sudo tee /sys/bus/platform/devices/MSHW0041:00/dgpu_power
# 1
kern  :info  : [72458.590938] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present
kern  :info  : [72458.694803] sb1_dgpu_sw:sb1_dgpu_sw_hgon: turned-on dGPU via HGON
[...] # nvidia driver loaded here just like the other dmesg logs

and seems that it can be turned off by 0 > dgpu_power:

$ dmesg -xw
#
# $ echo 0 | sudo tee /sys/bus/platform/devices/MSHW0041:00/dgpu_power
# 0
kern  :info  : [72550.171585] sb1_dgpu_sw:sb1_dgpu_sw_hgof: turned-off dGPU via HGOF

It also can be turned on by calling 1 > dgpu_dsmcall twice.

The next step is, I can't find a proper way to unlink dGPU from PCIe. Even after calling 0 > dgpu_power (or HGOF), it just turns off but remain linked to PCIe (I still see dGPU on lspci). How this is handled on SB2/SB3?

qzed commented 3 years ago

First, I needed to unload surface_sam_sid module. It seems that the device is already used by the driver:

Ah, I copy-pasted the driver struct from the SID driver and forgot to change the name... fixed that just now.

The next step is, I can't find a proper way to unlink dGPU from PCIe. Even after calling 0 > dgpu_power (or HGOF), it just turns off but remain linked to PCIe (I still see dGPU on lspci). How this is handled on SB2/SB3?

That's pretty similar to the SB2/SB3 DSM call as implemented so far. Same behavior there when only using the DSM call. You'll need a bit of PCIe link- and power-management do do the rest. This is done here (for SB2/SB3). Since HGON/HGOF are called from the Root Port _ON/_OFF functions, you might not have to call them directly.

bluemage650 commented 3 years ago

Hey @qzed, I'm trying to essentially reduce the difference between your kernel release and my nix module before I go digging into patching the i2c quirk patch, and I came across the fact that pretty much all of the CONFIG_ directives in linux-surface-sb1-test-v4.19.133/configs/surface-5.4.config error out (I forget the exact error, but it's the same as if you just dropped in a CONFIG_BOGUS=m in the kernel module list and compile)

My WIP nix import is at https://gist.github.com/3a4fa3e523bfc55a90c24407935d5b22

Am I missing a loaded module or something really obvious?

StollD commented 3 years ago

They error because nix doesn't use CONFIG_BANANA=m, it uses BANANA m. You need to convert the config options from the former into the latter format.

bluemage650 commented 3 years ago

Oh right. I probably shouldn't be messing with kernel compilation while I'm sleep deprived. Oops.

Still having trouble with these ones though,

error: unused option: SURFACE_SAM_DTX
error: unused option: SURFACE_SAM_HPS
error: unused option: SURFACE_SAM_SAN
error: unused option: SURFACE_SAM_SID
error: unused option: SURFACE_SAM_SID_GPELID
error: unused option: SURFACE_SAM_SID_PERFMODE
error: unused option: SURFACE_SAM_SID_POWER
error: unused option: SURFACE_SAM_SID_VHF
error: unused option: SURFACE_SAM_SSH
error: unused option: SURFACE_SAM_SSH_DEBUG_DEVICE
error: unused option: SURFACE_SAM_VHF

They also don't show up when I zcat /proc/config.gz | grep SURFACE.

qzed commented 3 years ago

Do you have the SAM patches applied? The message reads a bit like those options are not present in the Kconfig.

bluemage650 commented 3 years ago

0003-surface-sam.patch, 0004-surface-sam-over-hid.patch, and the I2C quirk patch are the only SAM patches, right? I have all of the patches/5.4 patches applied.

qzed commented 3 years ago

Yeah. all the options are added in 0003-surface-sam.patch. Is there any other log that might give us a hint why those are unused?

Btw: You're adding both 0004-surface-sam-over-hid.patch and the I2C quirk patch. The first one is basically a replacement for the latter, so you don't need to add the latter any more.