Open sadnub opened 4 years ago
For this we need to implement support for the embedded controller. Some progress has been made in https://github.com/jakeday/linux-surface/issues/286#issuecomment-521518380, but I can't give you any time-frame when (or even a guarantee that) this will be implemented.
I'll rename this issue so we can keep it open as tracking-issue for the other one.
Hi qzed!
Thank you for all your support. I've been using the custom kernel for quite a while.
Is there any progress that has been made since your last post on February about the dGPU?
Best,
Felipe
Unfortunately not. I've fixed the link in my comment above as that has been somehow broken. @kitakar5525 managed to power on the dGPU but at the moment that requires some not so clean patching as there are some driver conflicts IIRC.
Hi everyone!
Any news about the dGPU?
Thanks!
Also on SB1, please let me know how I can help with this issue
The thing missing right now is manpower. I.e. for the dGPU, someone would need to update the patch in https://github.com/jakeday/linux-surface/issues/286#issuecomment-521518380 (so that it has a chance of being accepted upstream) and put together a driver that does the appropriate ACPI calls. For the other stuff, there probably needs to be a special HID driver that attaches to the HID-over-I2C device.
Is SB1 inherently tricky in how it does dGPU? I guess I'm surprised SB1 is the only one without dGPU atm. Any possibility of leveraging what's already been done for SB2 or others?
Been a while since I've done linux drivers but I'll start looking into it. Might take a while to figure out how @kitakar5525 got where he was
The fundamental parts are different than on the SB2. Specifically, it looks like the dGPU power is (at least in part) controlled by the embedded controller (SAM, in constrast to SB2 this is not SAM-over-SSH but SAM-over-HID). Even if it's not actually controlled by SAM, you need to set up communication with it as that otherwise leads to errors (https://github.com/jakeday/linux-surface/issues/286#issuecomment-453378745). That's what that one patch does: https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch.
After that, calling the ACPI functions to turn on the dGPU should succeed. Ideally, you'd want to wrap those into a driver. Since they're also being called by the _ON and _OFF methods of the power region, you might get away with actually just using the standard PCIe power methods. Similar to https://github.com/torvalds/linux/blob/69119673bd50b176ded34032fadd41530fb5af21/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1262-L1265 and https://github.com/torvalds/linux/blob/69119673bd50b176ded34032fadd41530fb5af21/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c#L1294-L1299.
Hey @qzed,
I'm trying to reproduce your work via the ACPI calls and so far I'm not having any luck. I applied the i2c quirk patch to NixOS's 20.03 default kernel successfully, however it looks like my dGPU is not powering on with either of those ACPI calls. The double call to acall_vgbi_dgpu
returns {0x00, 0x00, 0x00, 0x00}
both times, and the other call acall "\_SB.PCI0.RP05.HGON"
returns 0x0called
.
Any idea where I'm going wrong with this? Maybe I have the wrong constants?
@richardcq Is there anything interesting in dmesg? How did you check the dGPU status? Looks like the ACPI call succeeded, so it should have at least sent the data to the EC. I don't own an SB1, so I haven't been able to test his myself. Maybe @kitakar5525 can help you more.
Here's the full run I did. When I tried the first time I did both the lspci
method and I checked lshw -class display
as well to see if a second display popped up. I also don't have any /sys/module/acpi/parameters/debug_*
, not sure why.
$ acall(){ echo "$1" | sudo tee /proc/acpi/call >/dev/null && sudo cat /proc/acpi/call;echo;}
$ VGBI_UUID_DGPU="{0x69 0x5C 0xD0 0x6F 0xE3 0xCD 0xF4 0x49 0x95 0xED 0xAB 0x16 0x65 0x49 0x80 0x35}"
$ acall_vgbi_dgpu(){ acall "\_SB_.PCI0.LPCB.EC0_.VGBI._DSM $VGBI_UUID_DGPU 1 1 0";}
$ acall "\_SB.PCI0.RP05.HGON"
0x0called
$ lspci -nn | grep -i nvidia
$ dmesg | tail -n 10
--- snip, random USB device messages ---
$ acall_vgbi_dgpu
{0x00, 0x00, 0x00, 0x00}
$ lspci -nn | grep -i nvidia
$ acall_vgbi_dgpu
{0x00, 0x00, 0x00, 0x00}
$ lspci -nn | grep -i nvidia
$ dmesg | tail -n 10
--- same USB device messages ---
[33698.222533] ACPI Warning: \_SB.PCI0.LPCB.EC0.VGBI._DSM: Argument #4 type mismatch - Found [Integer], ACPI requires [Package] (20190816/nsarguments-59)
I also don't have any
/sys/module/acpi/parameters/debug_*
That's odd.
The rest looks okay. Also it should be enough to just call either one of them. Can you upload the full dmesg log after running one of the ACPI calls, including the USB messages? The SAM EC is after all connected via HID-over-I2C and HID was more or less initially designed for USB, so maybe there's some connection. lspci
should definitely show the device if turned on.
Maybe there's some PCI refresh missing?
Sorry about that, I didn't mean to mislead you. The USB messages are from my Fiio K3 USB DAC on usb1-1.1.4. Every so often (probably when I jiggle the cord) it disconnects and reconnects according to the kernel log. Hopefully not related, but I'll dump it anyway.
kern :warn : [ +0.000125] usb 1-1.1.4: Device not responding to setup address.
kern :warn : [ +0.203885] usb 1-1.1.4: Device not responding to setup address.
kern :err : [ +0.207886] usb 1-1.1.4: device not accepting address 59, error -71
kern :info : [ +0.491997] usb 1-1.1.4: new full-speed USB device number 60 using xhci_hcd
kern :warn : [ +0.000123] usb 1-1.1.4: Device not responding to setup address.
kern :warn : [ +0.204018] usb 1-1.1.4: Device not responding to setup address.
kern :err : [ +0.207839] usb 1-1.1.4: device not accepting address 60, error -71
kern :err : [ +0.000154] usb 1-1.1-port4: unable to enumerate USB device
kern :info : [ +0.183837] usb 1-1.1.4: new full-speed USB device number 61 using xhci_hcd
kern :err : [ +0.075979] usb 1-1.1.4: device descriptor read/64, error -32
kern :err : [ +0.599994] usb 1-1.1.4: device descriptor read/64, error -71
kern :info : [ +0.600014] usb 1-1.1.4: new full-speed USB device number 62 using xhci_hcd
kern :err : [ +0.279994] usb 1-1.1.4: device descriptor read/64, error -71
kern :err : [ +0.599995] usb 1-1.1.4: device descriptor read/64, error -71
kern :info : [ +0.108065] usb 1-1.1-port4: attempt power cycle
kern :info : [ +0.803802] usb 1-1.1.4: new full-speed USB device number 63 using xhci_hcd
kern :warn : [ +0.000146] usb 1-1.1.4: Device not responding to setup address.
kern :warn : [ +0.204040] usb 1-1.1.4: Device not responding to setup address.
kern :err : [ +0.207851] usb 1-1.1.4: device not accepting address 63, error -71
kern :info : [ +0.492000] usb 1-1.1.4: new high-speed USB device number 64 using xhci_hcd
kern :info : [ +0.012535] usb 1-1.1.4: New USB device found, idVendor=2972, idProduct=0047, bcdDevice= 0.11
kern :info : [ +0.000004] usb 1-1.1.4: New USB device strings: Mfr=1, Product=3, SerialNumber=0
kern :info : [ +0.000002] usb 1-1.1.4: Product: K3
kern :info : [ +0.000002] usb 1-1.1.4: Manufacturer: FiiO
kern :info : [ +0.924365] usb 1-1.1.4: 1:3 : unsupported format bits 0x100000000```
Ah yeah okay you're right there. That doesn't seem to be related. Does echo 1 | sudo tee /sys/bus/pci/rescan
bring up anything in lspci?
Nope, no luck. No dmesg
logs either.
It's possible (and pretty likely) that subsequent HGON/DSM calls don't actually do anything, so the dmesg log may be misleading. Can you try to run acall "\_SB.PCI0.RP05.HGOF"
and then acall "\_SB.PCI0.RP05.HGON"
and look at the log? That should basically power-cycle the dGPU.
$ acall "\_SB.PCI0.RP05.HGOF"
0x1called
$ acall "\_SB.PCI0.RP05.HGON"
0x0called
$ lspci -nn | grep -i nvidia
It's interesting, the HGOF returns what @kitakar5525 got when they ran HGON as the return message (0x1), but my HGON just gets 0x0. I also get a delay between the HGON message and the return value but the HGOF is instant. No dmesg logs from either command.
$ time acall "\_SB.PCI0.RP05.HGOF"
0x1called
real 0m0.069s
user 0m0.004s
sys 0m0.010s
$ time acall "\_SB.PCI0.RP05.HGON"
0x0called
real 0m6.052s
user 0m0.005s
sys 0m0.150s
Hmm, I guess you'll have to trace through the methods to figure out where things are going wrong. For example, the HGON call could time out for some reason: https://github.com/linux-surface/acpidumps/blob/master/surface_book_1/dsdt.dsl#L18189. Probably also a good idea to add some debug prints to check when and what is being sent to the EC (https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch#L184).
You could maybe also try a two-button reset (hold power and volume-up button) to make sure that it isn't any firmware bug. But I kind of doubt that that's going to help here.
I will give this a crack over the weekend and see what I come up with. I suspect it's not a firmware bug, since the dGPU works on Windows, but who knows?
Hi @richardcq, for my SB1, dGPU will appear after calling \_SB.PCI0.RP05.HGON
.
Here is the log with nouveau driver, for your infomation:
```bash kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present kern :info : [32683.023432] pci 0000:01:00.0: [10de:1427] type 00 class 0x030200 kern :info : [32683.023473] pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00ffffff] kern :info : [32683.023487] pci 0000:01:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref] kern :info : [32683.023500] pci 0000:01:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref] kern :info : [32683.023509] pci 0000:01:00.0: reg 0x24: [io 0x0000-0x007f] kern :info : [32683.023518] pci 0000:01:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref] kern :info : [32683.023644] pci 0000:01:00.0: Enabling HDA controller kern :info : [32683.023784] pci 0000:01:00.0: 15.752 Gb/s available PCIe bandwidth, limited by 8.0 GT/s PCIe x2 link at 0000:00:1c.0 (capable of 126.016 Gb/s with 8.0 GT/s PCIe x16 link) kern :info : [32683.033409] pci 0000:01:00.0: BAR 1: assigned [mem 0xc0000000-0xcfffffff 64bit pref] kern :info : [32683.033419] pci 0000:01:00.0: BAR 3: assigned [mem 0xa2000000-0xa3ffffff 64bit pref] kern :info : [32683.033427] pci 0000:01:00.0: BAR 0: assigned [mem 0xba000000-0xbaffffff] kern :info : [32683.033431] pci 0000:01:00.0: BAR 6: assigned [mem 0xb9700000-0xb977ffff pref] kern :info : [32683.033432] pci 0000:01:00.0: BAR 5: assigned [io 0x3000-0x307f] kern :info : [32683.033436] pcieport 0000:00:1c.0: PCI bridge to [bus 01] kern :info : [32683.033438] pcieport 0000:00:1c.0: bridge window [io 0x3000-0x6fff] kern :info : [32683.033442] pcieport 0000:00:1c.0: bridge window [mem 0xb9700000-0xd16fffff] kern :info : [32683.033444] pcieport 0000:00:1c.0: bridge window [mem 0xa1400000-0xb93fffff 64bit pref] kern :warn : [32683.183855] ACPI Warning: \_SB.PCI0.GFX0._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59) kern :warn : [32683.183899] ACPI Warning: \_SB.PCI0.RP05.PXSX._DSM: Argument #4 type mismatch - Found [Buffer], ACPI requires [Package] (20200326/nsarguments-59) kern :info : [32683.183967] nouveau 0000:01:00.0: enabling device (0000 -> 0003) kern :info : [32683.184177] nouveau 0000:01:00.0: NVIDIA GM206 (126270a1) kern :info : [32683.239035] nouveau 0000:01:00.0: bios: version 84.06.73.00.02 kern :info : [32683.256807] nouveau 0000:01:00.0: fb: 2048 MiB GDDR5 kern :err : [32683.256831] nouveau 0000:01:00.0: bus: MMIO read of 00000000 FAULT at 022554 [ IBUS ] kern :info : [32683.337489] [TTM] Zone kernel: Available graphics memory: 8159696 KiB kern :info : [32683.337490] [TTM] Zone dma32: Available graphics memory: 2097152 KiB kern :info : [32683.337491] [TTM] Initializing pool allocator kern :info : [32683.337494] [TTM] Initializing DMA pool allocator kern :info : [32683.337510] nouveau 0000:01:00.0: DRM: VRAM: 2048 MiB kern :info : [32683.337511] nouveau 0000:01:00.0: DRM: GART: 1048576 MiB kern :info : [32683.337514] nouveau 0000:01:00.0: DRM: Pointer to TMDS table not found kern :info : [32683.337515] nouveau 0000:01:00.0: DRM: DCB version 4.1 kern :info : [32683.338600] nouveau 0000:01:00.0: DRM: MM: using COPY for buffer copies kern :info : [32683.338849] [drm] Initialized nouveau 1.3.1 20120801 for 0000:01:00.0 on minor 1 ```
I also don't have any
/sys/module/acpi/parameters/debug_*
, not sure why.
/sys/module/acpi/parameters/debug_layer
and /sys/module/acpi/parameters/debug_level
require CONFIG_ACPI_DEBUG=y
[1].
Since you're using NixOS, I guess some kernel configs are also missing for dGPU?
Here are some config fragments from 5.7.6-arch1-1-surface
:
$ zcat /proc/config.gz | grep -i "ACPI_DEBUG"
# CONFIG_ACPI_DEBUGGER is not set
CONFIG_ACPI_DEBUG=y
# CONFIG_THINKPAD_ACPI_DEBUGFACILITIES is not set
# CONFIG_THINKPAD_ACPI_DEBUG is not set
$ zcat /proc/config.gz | grep -i "nouveau"
CONFIG_DRM_NOUVEAU=m
# CONFIG_NOUVEAU_LEGACY_CTX_SUPPORT is not set
CONFIG_NOUVEAU_DEBUG=5
CONFIG_NOUVEAU_DEBUG_DEFAULT=3
# CONFIG_NOUVEAU_DEBUG_MMU is not set
CONFIG_DRM_NOUVEAU_BACKLIGHT=y
CONFIG_DRM_NOUVEAU_SVM=y
$ zcat /proc/config.gz | grep -i "nvidia"
CONFIG_NET_VENDOR_NVIDIA=y
CONFIG_I2C_NVIDIA_GPU=m
# CONFIG_FB_NVIDIA is not set
CONFIG_TYPEC_NVIDIA_ALTMODE=m
FYI, upstream Arch Linux's kernel config file is located here: https://git.archlinux.org/svntogit/packages.git/tree/trunk/config?h=packages/linux
It may also be helpful to attach your entire kernel config here.
On 5.7.6-arch1-1-surface
:
$ lsmod | grep nouveau
nouveau 2375680 1
mxm_wmi 16384 1 nouveau
wmi 36864 2 mxm_wmi,nouveau
ttm 118784 1 nouveau
i2c_algo_bit 16384 2 i915,nouveau
drm_kms_helper 258048 2 i915,nouveau
drm 581632 17 drm_kms_helper,i915,ttm,nouveau
agpgart 53248 5 intel_agp,intel_gtt,ttm,nouveau,drm
According to the lsmod output, the following configs may be necessary:
$ zcat /proc/config.gz | grep -i "mxm_wmi"
CONFIG_MXM_WMI=m
$ zcat /proc/config.gz | grep -i "ttm"
CONFIG_DRM_TTM=m
CONFIG_DRM_TTM_DMA_PAGE_POOL=y
CONFIG_DRM_TTM_HELPER=m
Here's my /proc/config.gz
https://gist.github.com/0d5a5b6cbff6765bc73521c1714c9558
I had to manually modprobe nouveau
to get this, but my lsmod | grep nouveau
looks fine:
$ lsmod | grep nouveau
nouveau 2183168 0
mxm_wmi 16384 1 nouveau
wmi 28672 2 mxm_wmi,nouveau
ttm 110592 1 nouveau
led_class 20480 3 input_leds,hid_corsair,nouveau
drm_kms_helper 172032 2 i915,nouveau
drm 479232 14 drm_kms_helper,i915,ttm,nouveau
agpgart 40960 4 intel_gtt,ttm,nouveau,drm
i2c_algo_bit 16384 2 i915,nouveau
i2c_core 77824 8 i2c_designware_platform,i2c_hid,i2c_designware_core,drm_kms_helper,i2c_algo_bit,i915,nouveau,drm
video 45056 2 i915,nouveau
backlight 20480 3 video,i915,nouveau
button 20480 1 nouveau
\_SB.PCI0.RP05.HGON
has no effect after modprobe nouveau
either, and neither does toggling HGOF
then HGON
. Same delay as well.
It looks like it is the correct i2c bus as well (figured I'd check, not really sure on the details of fun embedded HW stuff)
tail $(sudo find /sys -name modalias) | grep -n1 PNP0C50
214-==> /sys/devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-5/i2c-MSHW0030:00/modalias <==
215:acpi:MSHW0030:PNP0C50:
216-
--
442-==> /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0A08:00/device:72/MSHW0030:00/modalias <==
443:acpi:MSHW0030:PNP0C50:
444-
$ ls -lh /sys/bus/i2c/devices/ | grep "MSHW0030"
lrwxrwxrwx 1 root root 0 Jul 4 00:04 i2c-MSHW0030:00 -> ../../../devices/pci0000:00/0000:00:15.0/i2c_designware.0/i2c-5/i2c-MSHW0030:00
Hmm... then, what happens when CONFIG_ACPI_PCI_SLOT=y
?
I wonder if the following output from my log is related:
kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present
Sorry for the late response.
I wonder if the following output from my log is related:
kern :info : [32682.857093] pcieport 0000:00:1c.0: pciehp: Slot(4): Card present
Definitely related. That's the PCIe hot-plug subsystem detecting the card after it's been powered up. Same happens on my SB2 when I turn on the dGPU. It's kinda hard to tell what's going wrong, might be that the PCIe hot-plug system doesn't register that a device has been added/powered up. (That's why adding some dev_info(...)
calls or something at https://github.com/qzed/linux-surface-sam-hid/blob/master/patches/5.1/0001-Add-quirk-for-Surface-SAM-I2C-address-space.patch#L184 would help).
Config options might be the problem, so you should probably try using a config file from this repo (e.g. combine the Arch base config with the surface-5.7.config
. You probably want the whole PCIe hot-plug and ACPI related stuff enabled.
I've uploaded a bunch of kernels with the necessary patches at https://github.com/linux-surface/linux-surface/releases/tag/sb1-test-v4.19.133.
I've had confirmation (via mail) that the v4.19 Arch kernel linked above does work in combination with the HGON
/HGOF
ACPI calls to turn on/off the dGPU.
If I have a bit of time (and am not in the midst of re-structuring the SAM driver), I'll try to improve the patch so that it can be included in this repo.
I've decided to simplify the patch and add it to the surface kernel (https://github.com/linux-surface/kernel/commit/119b811af88da33ebcf8892effb23f76e0f388bb). While I believe that this patch does not provide the ideal solution, I prefer it over the somewhat awkward implementation with device quirks.
As it stands, this patch has the potential of some breakage on devices relying on I2C RawBytes operation region access, but none of the surface devices should be affected. Furthermore, there doesn't seem to be any driver using RawBytes access upstream, so it shouldn't break anything that is supported upstream.
Thanks! I tried the patch on v4.19 and v5.8-rc, dGPU successfully turned on via HGON
.
(As a reference, I left a dmesg log when the patch is not applied. The message is a little bit different than my old comment) EDIT: log from v5.8-rc6
$ acall(){ echo "$1" | sudo tee /proc/acpi/call >/dev/null && sudo cat /proc/acpi/call;echo;} # wrapper
$ acall "\_SB.PCI0.RP05.HGON"
$ dmesg -xw
kern :warn : [57338.916511] i2c i2c-5: protocol 0x0e not supported for client 0x28
kern :err : [57338.916513] ACPI Error: AE_BAD_PARAMETER, Returned by Handler for [GenericSerialBus] (20200528/evregion-264)
kern :err : [57338.916516] ACPI Error: Result stack is empty! State=0000000031fb5ad7 (20200528/dswstate-64)
kern :err : [57338.916521] ACPI Error: Aborting method \_SB.PCI0.I2C0.SAM.SCMD due to previous error (AE_BAD_PARAMETER) (20200528/psparse-529)
kern :err : [57338.916524] ACPI Error: Aborting method \_SB.PCI0.RP05.HGON due to previous error (AE_BAD_PARAMETER) (20200528/psparse-529)
kern :err : [57338.916529] acpi_call: Method call failed: Error: AE_BAD_PARAMETER
Neat! Thanks for testing!
Hi!
I'm sorry, is it working now but only on Arch? I didn't really get it. I'm using vanilla Fedora now.
And btw, does this update means that at some point the current vanilla kernel will have the necessary patches for the dgpu to work?
Thanks!
I'm sorry, is it working now but only on Arch?
It should work on every kernel with the patch applied. Currently there are only kernels for Debian and Arch in the link I provided above. The patch has not yet been integrated into this repository here.
And btw, does this update means that at some point the current vanilla kernel will have the necessary patches for the dgpu to work?
I will likely update the patches in this repository later this week (want to finish up on something else, will update the patches here after that), meaning after that it should work on Fedora, too. I'll write an update here and on our announcement issue (#96). At the moment, though, it's only in the kernel repository.
The latest kernels (v5.7.11 and v4.19.135, currently building) have support for setting the dGPU power via ACPI call.
Thanks for all the work @qzed... but I'm a bit lost now... how do we use it :smile: ?
@jrevillard Sorry, that wasn't properly documented yet. Just updated the wiki: https://github.com/linux-surface/linux-surface/wiki/Surface-Book#controlling-dgpu-power-state.
Great thanks ! it works on my Gentoo with 5.7.11 kernel.
⌂96% [root:/home/jerome] # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
⌂92% [root:/home/jerome] # echo "\_SB.PCI0.RP05.HGOF" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGOF
Now, another stupid question... can we concretely benefit from this dGPU ?
You should be able to use it with CUDA/OpenCL (and programs using that) out-of-the-box, as well as via optirun/primusrun for OpenGL/Vulkan (at least via nvidia-dkms
). Here's an explanation on how to set up optirun for the SB2, which should be fairly similar for the SB1, except that you have to replace the sudo surface dgpu ...
calls with the ACPI calls.
I'm not sure how the dGPU behaves performance wise. On the SB2, to unlock its full potential, you'd have to set performance modes to get appropriate cooling. On the SB1, performance modes are not supported yet (and I have no clue if it even has those, or a similar concept at least).
Wow that's an awesome improvement! Thanks for all your work, guys! @qzed the Fedora commands for the acpi calls are the same as stated in the Wiki?
I hope this dgpu activation helps me run Darktable and properly run better games on my SB1 Performance lol
@qzed the Fedora commands for the acpi calls are the same as stated in the Wiki?
Should be, as long as you have the acpi_call
module installed.
Thanks a lot @qzed ! I will give a try.
With this simple test, it does not work but perhaps you would prefer to open another case for this:
[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
\_SB.PCI0.RP05.HGON
[root:/home/jerome] # nvidia-modprobe
[root:/home/jerome] 1 # dmesg
....
kern :warn : [ 2409.944012] NVRM: No NVIDIA graphics adapter found!
kern :info : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
Do you have an idea ?
Should be, as long as you have the
acpi_call
module installed.
@toastyfe FYI, this is not available through the default repositories, or rpmfusion. But it is packaged by the TLP guys in an additional repository (they use it for battery calibration on thinkpads).
sudo dnf install https://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-$(rpm -E %fedora).noarch.rpm
sudo dnf install https://repo.linrunner.de/fedora/tlp/repos/releases/tlp-release.fc$(rpm -E %fedora).noarch.rpm
sudo dnf install kernel-surface-devel akmod-acpi_call
You need to disable secureboot for the module to work, because Fedora locks down the kernel and forbids loading unsigned modules when it is enabled.
With this simple test, it does not work but perhaps you would prefer to open another case for this:
[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call \_SB.PCI0.RP05.HGON [root:/home/jerome] # nvidia-modprobe [root:/home/jerome] 1 # dmesg .... kern :warn : [ 2409.944012] NVRM: No NVIDIA graphics adapter found! kern :info : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
Do you have an idea ?
Hmm, no clue really. You have the dkms version of the nvidia driver installed, right? Maybe there are some config options missing. What's in the dmesg log directly after you turn on the dGPU? There should be output from pcie-core detecting the device. What does lspci give you (before and after enabling the dGPU)?
Hey @StollD , thanks for the tip! I was having a hard time trying to find the acpi_call module. I'm going to to use these commands.
With this simple test, it does not work but perhaps you would prefer to open another case for this:
[root:/home/jerome] 130 # echo "\_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call \_SB.PCI0.RP05.HGON [root:/home/jerome] # nvidia-modprobe [root:/home/jerome] 1 # dmesg .... kern :warn : [ 2409.944012] NVRM: No NVIDIA graphics adapter found! kern :info : [ 2409.944210] nvidia-nvlink: Unregistered the Nvlink Core, major device number 235
Do you have an idea ?
Hmm, no clue really. You have the dkms version of the nvidia driver installed, right? Maybe there are some config options missing. What's in the dmesg log directly after you turn on the dGPU? There should be output from pcie-core detecting the device. What does lspci give you (before and after enabling the dGPU)?
There is absolutely nothing in the dmesg log after turning on dGPU..... Same, no difference for the lspci output.... so in fact, it seems that it does not work for me...
Can you post a config or diff your kenrel config against the Arch config? If the card isn't detected when it's turned on, I think there might be some PCIe hot-plug options missing.
Can you post a config or diff your kenrel config against the Arch config? If the card isn't detected when it's turned on, I think there might be some PCIe hot-plug options missing.
Here is my config file: https://gist.github.com/jrevillard/4537e0fd1d040f66804bb48fd91dbd7c
I think you're missing at least CONFIG_PCIEPORTBUS
and CONFIG_HOTPLUG_PCI_PCIE
(both should be set to y
). The dGPU (at least on the SB2) is connected via a PCIe root port, which basically functions as some sort of PCIe hot-pluggable slot.
Hi! I did run the acpi call and apparently got it to work, because it did not show additional messages:
felipe@felipe:~$ echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
But still, I have not felt any difference after running the command. I tried lspci to see if something different was detected, but it seems as before:
felipe@felipe:~$ lspci 00:00.0 Host bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Host Bridge/DRAM Registers (rev 08) 00:02.0 VGA compatible controller: Intel Corporation Skylake GT2 [HD Graphics 520] (rev 07) 00:05.0 Multimedia controller: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor Imaging Unit (rev 01) 00:08.0 System peripheral: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model 00:14.0 USB controller: Intel Corporation Sunrise Point-LP USB 3.0 xHCI Controller (rev 21) 00:14.2 Signal processing controller: Intel Corporation Sunrise Point-LP Thermal subsystem (rev 21) 00:14.3 Multimedia controller: Intel Corporation Device 9d32 (rev 01) 00:15.0 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #0 (rev 21) 00:15.1 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #1 (rev 21) 00:15.2 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #2 (rev 21) 00:15.3 Signal processing controller: Intel Corporation Sunrise Point-LP Serial IO I2C Controller #3 (rev 21) 00:16.0 Communication controller: Intel Corporation Sunrise Point-LP CSME HECI #1 (rev 21) 00:16.4 Communication controller: Intel Corporation Device 9d3e (rev 21) 00:1c.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #5 (rev f1) 00:1d.0 PCI bridge: Intel Corporation Sunrise Point-LP PCI Express Root Port #9 (rev f1) 00:1d.3 PCI bridge: Intel Corporation Device 9d1b (rev f1) 00:1f.0 ISA bridge: Intel Corporation Sunrise Point-LP LPC Controller (rev 21) 00:1f.2 Memory controller: Intel Corporation Sunrise Point-LP PMC (rev 21) 00:1f.3 Audio device: Intel Corporation Sunrise Point-LP HD Audio (rev 21) 02:00.0 Non-Volatile memory controller: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 (rev 01) 03:00.0 Ethernet controller: Marvell Technology Group Ltd. 88W8897 [AVASTAR] 802.11ac Wireless
Is that how it should behave?
lspci
should show the dGPU. What kernel are you using?
I was testing with the stock Ubuntu kernel.
I've just installed the 5.7.11-surface kernel and retry to run the echo "_SB.PCI0.RP05.HGON" | sudo tee /proc/acpi/call
command but this time, for some reason, I wasn't able to find the /proc/acpi/call folder.
It says it can't find...
Hello, I'm just curious is DGPU support on SB1 is on the road map?
Excellent work on this!