system76 / firmware-open

System76 Open Firmware
Other
952 stars 86 forks source link

eGPU not initializing #66

Open jacobgkau opened 4 years ago

jacobgkau commented 4 years ago

Tested on a darp6 (customer experiencing the same issue on a galp4.) When we plug in an NVIDIA GPU in the Akitio Node, we do see it listed in lspci:

system76@pop-os:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
06:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)

However, nvidia-smi does not see the GPU, and we are not able to load the NVIDIA driver:

system76@pop-os:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
system76@pop-os:~$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device

dmesg shows this output continuously repeated while the GPU is connected:

[  251.993332] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[  251.993840] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[  251.993841] NVRM: The system BIOS may have misconfigured your GPU.
[  251.993845] nvidia: probe of 0000:06:00.0 failed with error -1
[  251.993863] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  251.993864] NVRM: None of the NVIDIA devices were initialized.
[  251.993998] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
vthg2themax commented 4 years ago

vfio now appears to be broken on 20.04 Pop_OS. :-( @tkolo Just a warning friend.

theironrobin commented 3 years ago

I also have a Razer Core and AMD RX 580, i am not able to get this working. on Pop_OS 20.04... darp6... lspci -k | grep -EA3 'VGA|3D|Display' shows both i915 (intel graphics) and amdgpu kernel drivers in use. but actual performance is not good. lots of screen tearing, very slow, and ~13 fps on gaming benchmarks.

EDIT: It seems that by using this egpu switcher I can get it working correctly, BUT, auto-detect feature is not working. So if I want to use egpu I need to do sudo systemctl disable egpu <- disables the auto switch service, just need to do this once sudo egpu-switcher switch egpu <- manually set X11 setting to use egpu. reboot <- with egpu connected Then, to switch back to internal: sudo egpu-switcher switch internal reboot

NOTE: this only seems to work if auto-login is set up which is a shame, privacy/security wise... also my hard drive is not encrypted, so I am not sure how that would affect things.

jacobgkau commented 3 years ago

NOTE: this only seems to work if auto-login is set up which is a shame, privacy/security wise... also my hard drive is not encrypted, so I am not sure how that would affect things.

I've tested with an RX 580 and a galp4. While I can confirm the auto-switching service doesn't work (seems to always switch to internal on boot), and that an Xorg restart is required after using the egpu-switcher switch command since the Xorg configuration file is being changed out, I am able to use the RX 580 as the primary GPU with full-disk encryption and without enabling auto-login. What issue are you seeing when you disable auto-login?

kocoman2 commented 1 year ago

can anyone please tell me what code is the "The fix was to increase the amount of hotplug memory reserved for the Thunderbolt bridge" in? thank you

crawfxrd commented 1 year ago

The coreboot configs PCIEXP_HOTPLUG_MEM and PCIEXP_HOTPLUG_PREFETCH_MEM.

antonkulaga commented 1 year ago

Guys, eGPU does not work with my lemur pro (lemp11 if I am not mistaken) with Razer Core and Nvidia RTX3060. I tried pretty much everything I found on the internet. Has anybody managed to make it work? I bought lemur specifically to bundle it with egpu when I need to do deep-learning and safe charge otherwise, but it does not work :((((

jacobgkau commented 1 year ago

lemp12 isn't initializing an RTX GPU in an Akitio Node Titan. dmesg output: lemp12-egpu.txt (Just trying to plug it in after boot in order to use it for CUDA. nvidia-smi is never showing it.)

A lemp12 customer with a Razer Core X and a GTX 1080 Ti is seeing the GPU show up in nvidia-smi at first, but it drops off after trying to use it for anything, sometimes just after having run nvidia-smi. Mattermost thread: https://chat.pop-os.org/pop-os/pl/1db8qbbhkb8n3pdrryx1r3bfcc