linux-surface / linux-surface

Linux Kernel for Surface Devices
4.68k stars 205 forks source link

[SB] SAM Support (dGPU-detection, clipboard-detach handling) #93

Open sadnub opened 4 years ago

sadnub commented 4 years ago

Hello, I'm just curious is DGPU support on SB1 is on the road map?

Excellent work on this!

qzed commented 4 years ago

For this we need to implement support for the embedded controller. Some progress has been made in https://github.com/jakeday/linux-surface/issues/286#issuecomment-521518380, but I can't give you any time-frame when (or even a guarantee that) this will be implemented.

I'll rename this issue so we can keep it open as tracking-issue for the other one.

fematarazzo commented 4 years ago

Hi qzed!

Thank you for all your support. I've been using the custom kernel for quite a while.

Is there any progress that has been made since your last post on February about the dGPU?

Best,

Felipe

qzed commented 4 years ago

Unfortunately not. I've fixed the link in my comment above as that has been somehow broken. @kitakar5525 managed to power on the dGPU but at the moment that requires some not so clean patching as there are some driver conflicts IIRC.

fematarazzo commented 4 years ago

Hi everyone!

Any news about the dGPU?

Thanks!

chiekku commented 4 years ago

Also on SB1, please let me know how I can help with this issue

qzed commented 4 years ago

The thing missing right now is manpower. I.e. for the dGPU, someone would need to update the patch in https://github.com/jakeday/linux-surface/issues/286#issuecomment-521518380 (so that it has a chance of being accepted upstream) and put together a driver that does the appropriate ACPI calls. For the other stuff, there probably needs to be a special HID driver that attaches to the HID-over-I2C device.

chiekku commented 4 years ago

Is SB1 inherently tricky in how it does dGPU? I guess I'm surprised SB1 is the only one without dGPU atm. Any possibility of leveraging what's already been done for SB2 or others?

Been a while since I've done linux drivers but I'll start looking into it. Might take a while to figure out how @kitakar5525 got where he was

qzed commented 4 years ago

The fundamental parts are different than on the SB2. Specifically, it looks like the dGPU power is (at least in part) controlled by the embedded controller (SAM, in constrast to SB2 this is not SAM-over-SSH but SAM-over-HID). Even if it's not actually controlled by SAM, you need to set up communication with it as that otherwise leads to errors (https://github.com/jakeday/linux-surface/issues/286#issuecomment-453378745). That's what that one patch does: https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch.

After that, calling the ACPI functions to turn on the dGPU should succeed. Ideally, you'd want to wrap those into a driver. Since they're also being called by the _ON and _OFF methods of the power region, you might get away with actually just using the standard PCIe power methods. Similar to https://github.com/torvalds/linux/blob/69119673bd50b176ded34032fadd41530fb5af21/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1262-L1265 and https://github.com/torvalds/linux/blob/69119673bd50b176ded34032fadd41530fb5af21/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1294-L1299.

bluemage650 commented 3 years ago

Hey @qzed, I'm trying to reproduce your work via the ACPI calls and so far I'm not having any luck. I applied the i2c quirk patch to NixOS's 20.03 default kernel successfully, however it looks like my dGPU is not powering on with either of those ACPI calls. The double call to acall_vgbi_dgpu returns {0x00, 0x00, 0x00, 0x00} both times, and the other call acall "\_SB.PCI0.RP05.HGON" returns 0x0called.

Any idea where I'm going wrong with this? Maybe I have the wrong constants?

qzed commented 3 years ago

@richardcq Is there anything interesting in dmesg? How did you check the dGPU status? Looks like the ACPI call succeeded, so it should have at least sent the data to the EC. I don't own an SB1, so I haven't been able to test his myself. Maybe @kitakar5525 can help you more.

bluemage650 commented 3 years ago

Here's the full run I did. When I tried the first time I did both the lspci method and I checked lshw -class display as well to see if a second display popped up. I also don't have any /sys/module/acpi/parameters/debug_*, not sure why.

$ acall(){ echo "$1" | sudo tee /proc/acpi/call >/dev/null && sudo cat /proc/acpi/call;echo;}
$ VGBI_UUID_DGPU="{0x69 0x5C 0xD0 0x6F 0xE3 0xCD 0xF4 0x49 0x95 0xED 0xAB 0x16 0x65 0x49 0x80 0x35}"
$ acall_vgbi_dgpu(){ acall "\_SB_.PCI0.LPCB.EC0_.VGBI._DSM $VGBI_UUID_DGPU 1 1 0";}
$ acall "\_SB.PCI0.RP05.HGON"
0x0called
$ lspci -nn | grep -i nvidia
$ dmesg | tail -n 10
--- snip, random USB device messages ---
$ acall_vgbi_dgpu 
{0x00, 0x00, 0x00, 0x00}
$ lspci -nn | grep -i nvidia
$ acall_vgbi_dgpu 
{0x00, 0x00, 0x00, 0x00}
$ lspci -nn | grep -i nvidia
$ dmesg | tail -n 10
--- same USB device messages ---
[33698.222533] ACPI Warning: \_SB.PCI0.LPCB.EC0.VGBI._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20190816/nsarguments-59)
qzed commented 3 years ago

I also don't have any /sys/module/acpi/parameters/debug_*

That's odd.

The rest looks okay. Also it should be enough to just call either one of them. Can you upload the full dmesg log after running one of the ACPI calls, including the USB messages? The SAM EC is after all connected via HID-over-I2C and HID was more or less initially designed for USB, so maybe there's some connection. lspci should definitely show the device if turned on.

Maybe there's some PCI refresh missing?

bluemage650 commented 3 years ago

Sorry about that, I didn't mean to mislead you. The USB messages are from my Fiio K3 USB DAC on usb1-1.1.4. Every so often (probably when I jiggle the cord) it disconnects and reconnects according to the kernel log. Hopefully not related, but I'll dump it anyway.


kern  :warn  : [  +0.000125] usb 1-1.1.4: Device not responding to setup address.
kern  :warn  : [  +0.203885] usb 1-1.1.4: Device not responding to setup address.
kern  :err   : [  +0.207886] usb 1-1.1.4: device not accepting address 59, error -71
kern  :info  : [  +0.491997] usb 1-1.1.4: new full-speed USB device number 60 using xhci_hcd
kern  :warn  : [  +0.000123] usb 1-1.1.4: Device not responding to setup address.
kern  :warn  : [  +0.204018] usb 1-1.1.4: Device not responding to setup address.
kern  :err   : [  +0.207839] usb 1-1.1.4: device not accepting address 60, error -71
kern  :err   : [  +0.000154] usb 1-1.1-port4: unable to enumerate USB device
kern  :info  : [  +0.183837] usb 1-1.1.4: new full-speed USB device number 61 using xhci_hcd
kern  :err   : [  +0.075979] usb 1-1.1.4: device descriptor read/64, error -32
kern  :err   : [  +0.599994] usb 1-1.1.4: device descriptor read/64, error -71
kern  :info  : [  +0.600014] usb 1-1.1.4: new full-speed USB device number 62 using xhci_hcd
kern  :err   : [  +0.279994] usb 1-1.1.4: device descriptor read/64, error -71
kern  :err   : [  +0.599995] usb 1-1.1.4: device descriptor read/64, error -71
kern  :info  : [  +0.108065] usb 1-1.1-port4: attempt power cycle
kern  :info  : [  +0.803802] usb 1-1.1.4: new full-speed USB device number 63 using xhci_hcd
kern  :warn  : [  +0.000146] usb 1-1.1.4: Device not responding to setup address.
kern  :warn  : [  +0.204040] usb 1-1.1.4: Device not responding to setup address.
kern  :err   : [  +0.207851] usb 1-1.1.4: device not accepting address 63, error -71
kern  :info  : [  +0.492000] usb 1-1.1.4: new high-speed USB device number 64 using xhci_hcd
kern  :info  : [  +0.012535] usb 1-1.1.4: New USB device found, idVendor=2972, idProduct=0047, bcdDevice= 0.11
kern  :info  : [  +0.000004] usb 1-1.1.4: New USB device strings: Mfr=1, Product=3, SerialNumber=0
kern  :info  : [  +0.000002] usb 1-1.1.4: Product: K3
kern  :info  : [  +0.000002] usb 1-1.1.4: Manufacturer: FiiO
kern  :info  : [  +0.924365] usb 1-1.1.4: 1:3 : unsupported format bits 0x100000000```
qzed commented 3 years ago

Ah yeah okay you're right there. That doesn't seem to be related. Does echo 1 | sudo tee /sys/bus/pci/rescan bring up anything in lspci?

bluemage650 commented 3 years ago

Nope, no luck. No dmesg logs either.

qzed commented 3 years ago

It's possible (and pretty likely) that subsequent HGON/DSM calls don't actually do anything, so the dmesg log may be misleading. Can you try to run acall "\_SB.PCI0.RP05.HGOF" and then acall "\_SB.PCI0.RP05.HGON" and look at the log? That should basically power-cycle the dGPU.

bluemage650 commented 3 years ago
$ acall "\_SB.PCI0.RP05.HGOF"
0x1called

$ acall "\_SB.PCI0.RP05.HGON"
0x0called

$ lspci -nn | grep -i nvidia

It's interesting, the HGOF returns what @kitakar5525 got when they ran HGON as the return message (0x1), but my HGON just gets 0x0. I also get a delay between the HGON message and the return value but the HGOF is instant. No dmesg logs from either command.

$ time acall "\_SB.PCI0.RP05.HGOF"
0x1called

real    0m0.069s
user    0m0.004s
sys     0m0.010s

$ time acall "\_SB.PCI0.RP05.HGON"
0x0called

real    0m6.052s
user    0m0.005s
sys     0m0.150s
qzed commented 3 years ago

Hmm, I guess you'll have to trace through the methods to figure out where things are going wrong. For example, the HGON call could time out for some reason: https://github.com/linux-surface/acpidumps/blob/master/surface_book_1/dsdt.dsl#L18189. Probably also a good idea to add some debug prints to check when and what is being sent to the EC (https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch#L184).

You could maybe also try a two-button reset (hold power and volume-up button) to make sure that it isn't any firmware bug. But I kind of doubt that that's going to help here.

bluemage650 commented 3 years ago

I will give this a crack over the weekend and see what I come up with. I suspect it's not a firmware bug, since the dGPU works on Windows, but who knows?

kitakar5525 commented 3 years ago

Hi @richardcq, for my SB1, dGPU will appear after calling \_SB.PCI0.RP05.HGON.

Here is the log with nouveau driver, for your infomation:

dmesg log with nouveau driver

```bash kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present kern :info : [32683.023432] pci 0000:01:00.0: [10de:1427] type 00 class 0x030200 kern :info : [32683.023473] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] kern :info : [32683.023487] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref] kern :info : [32683.023500] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref] kern :info : [32683.023509] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] kern :info : [32683.023518] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref] kern :info : [32683.023644] pci 0000:01:00.0: Enabling HDA controller kern :info : [32683.023784] pci 0000:01:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:1c.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link) kern :info : [32683.033409] pci 0000:01:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref] kern :info : [32683.033419] pci 0000:01:00.0: BAR 3: assigned [mem 0xa2000000-0xa3ffffff 64bit pref] kern :info : [32683.033427] pci 0000:01:00.0: BAR 0: assigned [mem 0xba000000-0xbaffffff] kern :info : [32683.033431] pci 0000:01:00.0: BAR 6: assigned [mem 0xb9700000-0xb977ffff pref] kern :info : [32683.033432] pci 0000:01:00.0: BAR 5: assigned [io 0x3000-0x307f] kern :info : [32683.033436] pcieport 0000:00:1c.0: PCI bridge to [bus 01] kern :info : [32683.033438] pcieport 0000:00:1c.0: bridge window [io 0x3000-0x6fff] kern :info : [32683.033442] pcieport 0000:00:1c.0: bridge window [mem 0xb9700000-0xd16fffff] kern :info : [32683.033444] pcieport 0000:00:1c.0: bridge window [mem 0xa1400000-0xb93fffff 64bit pref] kern :warn : [32683.183855] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59) kern :warn : [32683.183899] ACPI Warning: \_SB.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59) kern :info : [32683.183967] nouveau 0000:01:00.0: enabling device (0000 -> 0003) kern :info : [32683.184177] nouveau 0000:01:00.0: NVIDIA GM206 (126270a1) kern :info : [32683.239035] nouveau 0000:01:00.0: bios: version 84.06.73.00.02 kern :info : [32683.256807] nouveau 0000:01:00.0: fb: 2048 MiB GDDR5 kern :err : [32683.256831] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 022554 [ IBUS ] kern :info : [32683.337489] [TTM] Zone kernel: Available graphics memory: 8159696 KiB kern :info : [32683.337490] [TTM] Zone dma32: Available graphics memory: 2097152 KiB kern :info : [32683.337491] [TTM] Initializing pool allocator kern :info : [32683.337494] [TTM] Initializing DMA pool allocator kern :info : [32683.337510] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB kern :info : [32683.337511] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB kern :info : [32683.337514] nouveau 0000:01:00.0: DRM: Pointer to TMDS table not found kern :info : [32683.337515] nouveau 0000:01:00.0: DRM: DCB version 4.1 kern :info : [32683.338600] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies kern :info : [32683.338849] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1 ```

I also don't have any /sys/module/acpi/parameters/debug_*, not sure why.

/sys/module/acpi/parameters/debug_layer and /sys/module/acpi/parameters/debug_level require CONFIG_ACPI_DEBUG=y [1].

Since you're using NixOS, I guess some kernel configs are also missing for dGPU?

Here are some config fragments from 5.7.6-arch1-1-surface:

$ zcat /proc/config.gz | grep -i "ACPI_DEBUG"          
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_DEBUG=y
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set

$ zcat /proc/config.gz | grep -i "nouveau"   
CONFIG_DRM_NOUVEAU=m
# CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT is not set
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3
# CONFIG_NOUVEAU_DEBUG_MMU is not set
CONFIG_DRM_NOUVEAU_BACKLIGHT=y
CONFIG_DRM_NOUVEAU_SVM=y

$ zcat /proc/config.gz | grep -i "nvidia" 
CONFIG_NET_VENDOR_NVIDIA=y
CONFIG_I2C_NVIDIA_GPU=m
# CONFIG_FB_NVIDIA is not set
CONFIG_TYPEC_NVIDIA_ALTMODE=m

FYI, upstream Arch Linux's kernel config file is located here: https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux

[1] https://www.kernel.org/doc/Documentation/acpi/debug.txt

kitakar5525 commented 3 years ago

It may also be helpful to attach your entire kernel config here.

EDIT

On 5.7.6-arch1-1-surface:

$ lsmod | grep nouveau
nouveau              2375680  1
mxm_wmi                16384  1 nouveau
wmi                    36864  2 mxm_wmi,nouveau
ttm                   118784  1 nouveau
i2c_algo_bit           16384  2 i915,nouveau
drm_kms_helper        258048  2 i915,nouveau
drm                   581632  17 drm_kms_helper,i915,ttm,nouveau
agpgart                53248  5 intel_agp,intel_gtt,ttm,nouveau,drm

According to the lsmod output, the following configs may be necessary:

$ zcat /proc/config.gz | grep -i "mxm_wmi"
CONFIG_MXM_WMI=m

$ zcat /proc/config.gz | grep -i "ttm"   
CONFIG_DRM_TTM=m
CONFIG_DRM_TTM_DMA_PAGE_POOL=y
CONFIG_DRM_TTM_HELPER=m
bluemage650 commented 3 years ago

Here's my /proc/config.gz https://gist.github.com/0d5a5b6cbff6765bc73521c1714c9558 I had to manually modprobe nouveau to get this, but my lsmod | grep nouveau looks fine:

$ lsmod | grep nouveau
nouveau              2183168  0
mxm_wmi                16384  1 nouveau
wmi                    28672  2 mxm_wmi,nouveau
ttm                   110592  1 nouveau
led_class              20480  3 input_leds,hid_corsair,nouveau
drm_kms_helper        172032  2 i915,nouveau
drm                   479232  14 drm_kms_helper,i915,ttm,nouveau
agpgart                40960  4 intel_gtt,ttm,nouveau,drm
i2c_algo_bit           16384  2 i915,nouveau
i2c_core               77824  8 i2c_designware_platform,i2c_hid,i2c_designware_core,drm_kms_helper,i2c_algo_bit,i915,nouveau,drm
video                  45056  2 i915,nouveau
backlight              20480  3 video,i915,nouveau
button                 20480  1 nouveau

\_SB.PCI0.RP05.HGON has no effect after modprobe nouveau either, and neither does toggling HGOF then HGON. Same delay as well.

It looks like it is the correct i2c bus as well (figured I'd check, not really sure on the details of fun embedded HW stuff)

 tail $(sudo find /sys -name modalias) | grep -n1 PNP0C50
214-==> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-5/i2c-MSHW0030:00/modalias <==
215:acpi:MSHW0030:PNP0C50:
216-
--
442-==> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:72/MSHW0030:00/modalias <==
443:acpi:MSHW0030:PNP0C50:
444-
$ ls -lh /sys/bus/i2c/devices/ | grep "MSHW0030"
lrwxrwxrwx 1 root root 0 Jul  4 00:04 i2c-MSHW0030:00 -> ../../../devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-5/i2c-MSHW0030:00
kitakar5525 commented 3 years ago

Hmm... then, what happens when CONFIG_ACPI_PCI_SLOT=y?

I wonder if the following output from my log is related:

kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present

qzed commented 3 years ago

Sorry for the late response.

I wonder if the following output from my log is related:

kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present

Definitely related. That's the PCIe hot-plug subsystem detecting the card after it's been powered up. Same happens on my SB2 when I turn on the dGPU. It's kinda hard to tell what's going wrong, might be that the PCIe hot-plug system doesn't register that a device has been added/powered up. (That's why adding some dev_info(...) calls or something at https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch#L184 would help).

Config options might be the problem, so you should probably try using a config file from this repo (e.g. combine the Arch base config with the surface-5.7.config. You probably want the whole PCIe hot-plug and ACPI related stuff enabled.

qzed commented 3 years ago

I've uploaded a bunch of kernels with the necessary patches at https://github.com/linux-surface/linux-surface/releases/tag/sb1-test-v4.19.133.

qzed commented 3 years ago

I've had confirmation (via mail) that the v4.19 Arch kernel linked above does work in combination with the HGON/HGOF ACPI calls to turn on/off the dGPU.

If I have a bit of time (and am not in the midst of re-structuring the SAM driver), I'll try to improve the patch so that it can be included in this repo.

qzed commented 3 years ago

I've decided to simplify the patch and add it to the surface kernel (https://github.com/linux-surface/kernel/commit/119b811af88da33ebcf8892effb23f76e0f388bb). While I believe that this patch does not provide the ideal solution, I prefer it over the somewhat awkward implementation with device quirks.

As it stands, this patch has the potential of some breakage on devices relying on I2C RawBytes operation region access, but none of the surface devices should be affected. Furthermore, there doesn't seem to be any driver using RawBytes access upstream, so it shouldn't break anything that is supported upstream.

kitakar5525 commented 3 years ago

Thanks! I tried the patch on v4.19 and v5.8-rc, dGPU successfully turned on via HGON.


(As a reference, I left a dmesg log when the patch is not applied. The message is a little bit different than my old comment) EDIT: log from v5.8-rc6

$ acall(){ echo "$1" | sudo tee /proc/acpi/call >/dev/null && sudo cat /proc/acpi/call;echo;} # wrapper
$ acall "\_SB.PCI0.RP05.HGON"
$ dmesg -xw
kern  :warn  : [57338.916511] i2c i2c-5: protocol 0x0e not supported for client 0x28
kern  :err   : [57338.916513] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [GenericSerialBus] (20200528/evregion-264)
kern  :err   : [57338.916516] ACPI Error: Result stack is empty! State=0000000031fb5ad7 (20200528/dswstate-64)
kern  :err   : [57338.916521] ACPI Error: Aborting method \_SB.PCI0.I2C0.SAM.SCMD due to previous error (AE_BAD_PARAMETER) (20200528/psparse-529)
kern  :err   : [57338.916524] ACPI Error: Aborting method \_SB.PCI0.RP05.HGON due to previous error (AE_BAD_PARAMETER) (20200528/psparse-529)
kern  :err   : [57338.916529] acpi_call: Method call failed: Error: AE_BAD_PARAMETER
qzed commented 3 years ago

Neat! Thanks for testing!

fematarazzo commented 3 years ago

Hi!

I'm sorry, is it working now but only on Arch? I didn't really get it. I'm using vanilla Fedora now.

And btw, does this update means that at some point the current vanilla kernel will have the necessary patches for the dgpu to work?

Thanks!

qzed commented 3 years ago

I'm sorry, is it working now but only on Arch?

It should work on every kernel with the patch applied. Currently there are only kernels for Debian and Arch in the link I provided above. The patch has not yet been integrated into this repository here.

And btw, does this update means that at some point the current vanilla kernel will have the necessary patches for the dgpu to work?

I will likely update the patches in this repository later this week (want to finish up on something else, will update the patches here after that), meaning after that it should work on Fedora, too. I'll write an update here and on our announcement issue (#96). At the moment, though, it's only in the kernel repository.

qzed commented 3 years ago

The latest kernels (v5.7.11 and v4.19.135, currently building) have support for setting the dGPU power via ACPI call.

jrevillard commented 3 years ago

Thanks for all the work @qzed... but I'm a bit lost now... how do we use it :smile: ?

qzed commented 3 years ago

@jrevillard Sorry, that wasn't properly documented yet. Just updated the wiki: https://github.com/linux-surface/linux-surface/wiki/Surface-Book#controlling-dgpu-power-state.

jrevillard commented 3 years ago

Great thanks ! it works on my Gentoo with 5.7.11 kernel.

⌂96% [root:/home/jerome] # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
⌂92% [root:/home/jerome] # echo "\_SB.PCI0.RP05.HGOF" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGOF

Now, another stupid question... can we concretely benefit from this dGPU ?

qzed commented 3 years ago

You should be able to use it with CUDA/OpenCL (and programs using that) out-of-the-box, as well as via optirun/primusrun for OpenGL/Vulkan (at least via nvidia-dkms). Here's an explanation on how to set up optirun for the SB2, which should be fairly similar for the SB1, except that you have to replace the sudo surface dgpu ... calls with the ACPI calls.

I'm not sure how the dGPU behaves performance wise. On the SB2, to unlock its full potential, you'd have to set performance modes to get appropriate cooling. On the SB1, performance modes are not supported yet (and I have no clue if it even has those, or a similar concept at least).

fematarazzo commented 3 years ago

Wow that's an awesome improvement! Thanks for all your work, guys! @qzed the Fedora commands for the acpi calls are the same as stated in the Wiki?

I hope this dgpu activation helps me run Darktable and properly run better games on my SB1 Performance lol

qzed commented 3 years ago

@qzed the Fedora commands for the acpi calls are the same as stated in the Wiki?

Should be, as long as you have the acpi_call module installed.

jrevillard commented 3 years ago

Thanks a lot @qzed ! I will give a try.

jrevillard commented 3 years ago

With this simple test, it does not work but perhaps you would prefer to open another case for this:

[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
[root:/home/jerome] # nvidia-modprobe 
[root:/home/jerome] 1 # dmesg
....
kern  :warn  : [ 2409.944012] NVRM: No NVIDIA graphics adapter found!
kern  :info  : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235

Do you have an idea ?

StollD commented 3 years ago

Should be, as long as you have the acpi_call module installed.

@toastyfe FYI, this is not available through the default repositories, or rpmfusion. But it is packaged by the TLP guys in an additional repository (they use it for battery calibration on thinkpads).

sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install https://repo.linrunner.de/fedora/tlp/repos/releases/tlp-release.fc$(rpm -E %fedora).noarch.rpm
sudo dnf install kernel-surface-devel akmod-acpi_call

You need to disable secureboot for the module to work, because Fedora locks down the kernel and forbids loading unsigned modules when it is enabled.

qzed commented 3 years ago

With this simple test, it does not work but perhaps you would prefer to open another case for this:

[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
[root:/home/jerome] # nvidia-modprobe 
[root:/home/jerome] 1 # dmesg
....
kern  :warn  : [ 2409.944012] NVRM: No NVIDIA graphics adapter found!
kern  :info  : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235

Do you have an idea ?

Hmm, no clue really. You have the dkms version of the nvidia driver installed, right? Maybe there are some config options missing. What's in the dmesg log directly after you turn on the dGPU? There should be output from pcie-core detecting the device. What does lspci give you (before and after enabling the dGPU)?

fematarazzo commented 3 years ago

Hey @StollD , thanks for the tip! I was having a hard time trying to find the acpi_call module. I'm going to to use these commands.

jrevillard commented 3 years ago

With this simple test, it does not work but perhaps you would prefer to open another case for this:

[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
[root:/home/jerome] # nvidia-modprobe 
[root:/home/jerome] 1 # dmesg
....
kern  :warn  : [ 2409.944012] NVRM: No NVIDIA graphics adapter found!
kern  :info  : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235

Do you have an idea ?

Hmm, no clue really. You have the dkms version of the nvidia driver installed, right? Maybe there are some config options missing. What's in the dmesg log directly after you turn on the dGPU? There should be output from pcie-core detecting the device. What does lspci give you (before and after enabling the dGPU)?

There is absolutely nothing in the dmesg log after turning on dGPU..... Same, no difference for the lspci output.... so in fact, it seems that it does not work for me...

qzed commented 3 years ago

Can you post a config or diff your kenrel config against the Arch config? If the card isn't detected when it's turned on, I think there might be some PCIe hot-plug options missing.

jrevillard commented 3 years ago

Can you post a config or diff your kenrel config against the Arch config? If the card isn't detected when it's turned on, I think there might be some PCIe hot-plug options missing.

Here is my config file: https://gist.github.com/jrevillard/4537e0fd1d040f66804bb48fd91dbd7c

qzed commented 3 years ago

I think you're missing at least CONFIG_PCIEPORTBUS and CONFIG_HOTPLUG_PCI_PCIE (both should be set to y). The dGPU (at least on the SB2) is connected via a PCIe root port, which basically functions as some sort of PCIe hot-pluggable slot.

fematarazzo commented 3 years ago

Hi! I did run the acpi call and apparently got it to work, because it did not show additional messages:

felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call

But still, I have not felt any difference after running the command. I tried lspci to see if something different was detected, but it seems as before:

felipe@felipe:~$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07) 00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21) 00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless

Is that how it should behave?

qzed commented 3 years ago

lspci should show the dGPU. What kernel are you using?

fematarazzo commented 3 years ago

I was testing with the stock Ubuntu kernel. I've just installed the 5.7.11-surface kernel and retry to run the echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/callcommand but this time, for some reason, I wasn't able to find the /proc/acpi/call folder. It says it can't find...