system76 / firmware-open

System76 Open Firmware
Other
952 stars 86 forks source link

eGPU not initializing #66

Open jacobgkau opened 4 years ago

jacobgkau commented 4 years ago

Tested on a darp6 (customer experiencing the same issue on a galp4.) When we plug in an NVIDIA GPU in the Akitio Node, we do see it listed in lspci:

system76@pop-os:~$ lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
06:00.0 VGA compatible controller: NVIDIA Corporation GP108 [GeForce GT 1030] (rev a1)

However, nvidia-smi does not see the GPU, and we are not able to load the NVIDIA driver:

system76@pop-os:~$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
system76@pop-os:~$ sudo modprobe nvidia
modprobe: ERROR: could not insert 'nvidia': No such device

dmesg shows this output continuously repeated while the GPU is connected:

[  251.993332] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[  251.993840] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[  251.993841] NVRM: The system BIOS may have misconfigured your GPU.
[  251.993845] nvidia: probe of 0000:06:00.0 failed with error -1
[  251.993863] NVRM: The NVIDIA probe routine failed for 1 device(s).
[  251.993864] NVRM: None of the NVIDIA devices were initialized.
[  251.993998] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
ahoneybun commented 4 years ago

From new logs from a Razer Core X Chroma with a Sapphire Radeon Nitro+ RX 590 8GB GPU it may be dealing with the same issue:

[ 143.468521] ---[ end trace 997c68591ecbbf46 ]--- [ 143.468937] amdgpu: probe of 0000:06:00.0 failed with error -22 [ 143.468951] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0 [ 143.469000] snd_hda_intel 0000:06:00.1: enabling device (0000 -> 0002) [ 143.469166] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client [ 143.469168] snd_hda_intel 0000:06:00.1: Force to non-snoop mode [ 143.481490] input: HDA ATI HDMI HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input56 [ 143.481534] input: HDA ATI HDMI HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input57 [ 143.481570] input: HDA ATI HDMI HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input58 [ 143.481600] input: HDA ATI HDMI HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input59 [ 143.481630] input: HDA ATI HDMI HDMI/DP,pcm=10 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input60 [ 143.481681] input: HDA ATI HDMI HDMI/DP,pcm=11 as /devices/pci0000:00/0000:00:1c.0/0000:01:00.0/0000:02:01.0/0000:04:00.0/0000:05:01.0/0000:06:00.1/sound/card1/input61

0-alex-0 commented 4 years ago

So I'm the customer with galp4. I was able to get the eGPU working with a Thinkpad and Lubuntu on an USB. Other than hardware change, the BIOS had a configuration part for thunderbolt 3. I disabled it's security and allowed pre OS support for the port. I think coreboot is either blocking the thunderbolt 3 connection due to security or most likely unable to support it in preboot environment since it's pretty new and coreboot BIOS environment only lets me choose boot options

sldavidson commented 4 years ago

I am the user of the Razer Core X Chroma and AMD RX 590. Like 0-alex-0, I was able to use the same hardware with a Dell XPS 13 running the same Kubuntu version. I had to go into the BIOS settings and change the Thunderbolt security settings before it worked.

jamalex commented 4 years ago

I've had no luck on a darp6 with coreboot also. I've tried with PopOS 19.10, Ubuntu 18.04, and Ubuntu 19.10. The eGPU (purchased specifically to use with this laptop) is a Sonnet eGFX Breakaway Box 550, containing a MSI GAMING GeForce RTX 2070 8GB (it's been tested plugged into a Windows laptop, and it worked). When I plug it in to the Darter Pro, it spins up, and I'm asked to authorize the TB3 device. When I do, the fans spin down to normal levels as expected. It shows the name of the eGPU box. But it doesn't work, and I get the following in dmesg:

[ 5331.431094] thunderbolt 0-1: new device found, vendor=0x8 device=0x38
[ 5331.431099] thunderbolt 0-1: Sonnet Technologies, Inc. eGFX Breakaway Box 550
[ 5372.032458] pcieport 0000:02:01.0: pciehp: Slot(1): Card present
[ 5372.032463] pcieport 0000:02:01.0: pciehp: Slot(1): Link Up
[ 5372.373040] pci 0000:04:00.0: [8086:1578] type 01 class 0x060400
[ 5372.373205] pci 0000:04:00.0: enabling Extended Tags
[ 5372.373483] pci 0000:04:00.0: supports D1 D2
[ 5372.373488] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 5372.373620] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[ 5372.373903] pcieport 0000:02:01.0: ASPM: current common clock configuration is broken, reconfiguring
[ 5372.373979] pci 0000:04:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5372.374247] pci 0000:05:01.0: [8086:1578] type 01 class 0x060400
[ 5372.374413] pci 0000:05:01.0: enabling Extended Tags
[ 5372.374673] pci 0000:05:01.0: supports D1 D2
[ 5372.374678] pci 0000:05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[ 5372.375077] pci 0000:04:00.0: PCI bridge to [bus 05-24]
[ 5372.375099] pci 0000:04:00.0:   bridge window [io  0x0000-0x0fff]
[ 5372.375114] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff]
[ 5372.375134] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 5372.375143] pci 0000:05:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 5372.375419] pci 0000:06:00.0: [10de:1f07] type 00 class 0x030000
[ 5372.375517] pci 0000:06:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[ 5372.375560] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
[ 5372.375600] pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[ 5372.375625] pci 0000:06:00.0: reg 0x24: [io  0x0000-0x007f]
[ 5372.375649] pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[ 5372.375931] pci 0000:06:00.0: PME# supported from D0 D3hot
[ 5372.376110] pci 0000:06:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[ 5372.376304] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[ 5372.376316] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[ 5372.376441] pci 0000:06:00.1: [10de:10f9] type 00 class 0x040300
[ 5372.376511] pci 0000:06:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[ 5372.377117] pci 0000:06:00.2: [10de:1ada] type 00 class 0x0c0330
[ 5372.377201] pci 0000:06:00.2: reg 0x10: [mem 0x00000000-0x0003ffff 64bit pref]
[ 5372.377262] pci 0000:06:00.2: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit pref]
[ 5372.377515] pci 0000:06:00.2: PME# supported from D0 D3hot
[ 5372.377812] pci 0000:06:00.3: [10de:1adb] type 00 class 0x0c8000
[ 5372.377879] pci 0000:06:00.3: reg 0x10: [mem 0x00000000-0x00000fff]
[ 5372.378217] pci 0000:06:00.3: PME# supported from D0 D3hot
[ 5372.378753] pci 0000:05:01.0: PCI bridge to [bus 06-24]
[ 5372.378768] pci 0000:05:01.0:   bridge window [io  0x0000-0x0fff]
[ 5372.378781] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff]
[ 5372.378795] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[ 5372.378800] pci_bus 0000:06: busn_res: [bus 06-24] end is updated to 06
[ 5372.378810] pci_bus 0000:05: busn_res: [bus 05-24] end is updated to 06
[ 5372.378837] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.378840] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.378844] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.378846] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.378849] pci 0000:04:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[ 5372.378854] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.378856] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.378859] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.378862] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.378865] pci 0000:05:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[ 5372.378871] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[ 5372.378874] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[ 5372.378877] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 5372.378880] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 5372.378882] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[ 5372.378885] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[ 5372.378887] pci 0000:06:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[ 5372.378890] pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[ 5372.378893] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[ 5372.378895] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[ 5372.378899] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[ 5372.378901] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[ 5372.378905] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[ 5372.378907] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[ 5372.378910] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[ 5372.378913] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[ 5372.378916] pci 0000:06:00.0: BAR 5: assigned [io  0x2000-0x207f]
[ 5372.378928] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 5372.378934] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[ 5372.378963] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[ 5372.378968] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[ 5372.378997] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379001] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[ 5372.379008] pcieport 0000:02:01.0:   bridge window [mem 0xd1000000-0xd17fffff]
[ 5372.379014] pcieport 0000:02:01.0:   bridge window [mem 0xc1000000-0xd0ffffff 64bit pref]
[ 5372.379021] PCI: No. 2 try to assign unassigned res
[ 5372.379026] pcieport 0000:02:01.0: resource 14 [mem 0xd1000000-0xd17fffff] released
[ 5372.379028] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379037] pcieport 0000:02:01.0: resource 15 [mem 0xc1000000-0xd0ffffff 64bit pref] released
[ 5372.379039] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379064] pcieport 0000:02:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379067] pcieport 0000:02:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379070] pcieport 0000:02:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379073] pcieport 0000:02:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379077] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379080] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379083] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379085] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379089] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[ 5372.379092] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[ 5372.379094] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[ 5372.379096] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[ 5372.379101] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[ 5372.379104] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[ 5372.379106] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[ 5372.379109] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[ 5372.379112] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[ 5372.379114] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[ 5372.379120] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[ 5372.379122] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[ 5372.379125] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[ 5372.379128] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[ 5372.379130] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[ 5372.379132] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[ 5372.379135] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[ 5372.379137] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[ 5372.379140] pci 0000:05:01.0: PCI bridge to [bus 06]
[ 5372.379146] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[ 5372.379175] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[ 5372.379180] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[ 5372.379208] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[ 5372.379212] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[ 5372.379279] pcieport 0000:04:00.0: enabling device (0000 -> 0001)
[ 5372.380308] pcieport 0000:05:01.0: enabling device (0000 -> 0001)
[ 5372.381376] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0
[ 5372.381929] snd_hda_intel 0000:06:00.1: Disabling MSI
[ 5372.381944] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client
[ 5372.382014] pci 0000:06:00.2: D0 power state depends on 0000:06:00.0
[ 5372.384262] xhci_hcd 0000:06:00.2: init 0000:06:00.2 fail, -16
[ 5372.384381] xhci_hcd: probe of 0000:06:00.2 failed with error -16
[ 5372.384443] pci 0000:06:00.3: D0 power state depends on 0000:06:00.0
[ 5372.384493] snd_hda_intel 0000:06:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[ 5372.384496] snd_hda_intel 0000:06:00.1: ioremap error
[ 5372.394293] nvidia-gpu 0000:06:00.3: pcim_iomap failed
[ 5372.394813] nvidia-gpu: probe of 0000:06:00.3 failed with error -12
[ 5372.455075] nouveau 0000:06:00.0: enabling device (0000 -> 0001)
[ 5372.455497] ------------[ cut here ]------------
[ 5372.455499] ioremap on RAM at 0x0000000000000000 - 0x0000000000101fff
[ 5372.455509] WARNING: CPU: 4 PID: 11823 at arch/x86/mm/ioremap.c:186 __ioremap_caller+0x2a7/0x2c0
[ 5372.455510] Modules linked in: nouveau(+) mxm_wmi wmi ttm i2c_nvidia_gpu rfcomm ccm cmac aufs overlay bnep nls_iso8859_1 snd_hda_codec_hdmi sof_pci_dev snd_sof_intel_hda_common snd_sof_intel_hda snd_sof_intel_byt snd_sof_intel_ipc snd_hda_codec_realtek snd_sof snd_hda_codec_generic snd_sof_xtensa_dsp ledtrig_audio snd_soc_skl snd_soc_hdac_hda snd_hda_ext_core snd_soc_skl_ipc snd_soc_sst_ipc snd_soc_sst_dsp snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core snd_compress ac97_bus snd_pcm_dmaengine snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm intel_rapl_msr snd_seq_midi snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device snd_timer joydev iwlmvm intel_rapl_common x86_pkg_temp_thermal intel_powerclamp mac80211 coretemp libarc4 snd btusb uvcvideo iwlwifi kvm_intel btrtl videobuf2_vmalloc btbcm videobuf2_memops btintel kvm videobuf2_v4l2 bluetooth irqbypass videobuf2_common intel_cstate rtsx_pci_ms videodev intel_rapl_perf input_leds ecdh_generic mc soundcore serio_raw memstick
[ 5372.455542]  ecc cfg80211 8250_dw intel_hid mac_hid sparse_keymap sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel rtsx_pci_sdmmc i915 aesni_intel aes_x86_64 crypto_simd video cryptd glue_helper i2c_algo_bit nvme drm_kms_helper psmouse syscopyarea sysfillrect nvme_core r8169 sysimgblt i2c_i801 fb_sys_fops rtsx_pci realtek drm thunderbolt ahci intel_lpss_pci libahci intel_lpss pinctrl_cannonlake pinctrl_intel
[ 5372.455563] CPU: 4 PID: 11823 Comm: systemd-udevd Not tainted 5.3.0-26-generic #28-Ubuntu
[ 5372.455564] Hardware name: System76 Darter Pro/Darter Pro, BIOS 2019-10-31_cca6ad0 10/30/2019
[ 5372.455568] RIP: 0010:__ioremap_caller+0x2a7/0x2c0
[ 5372.455570] Code: 0f b7 05 70 db 5b 01 48 09 c1 e9 98 fe ff ff 48 8d 55 c8 48 8d 75 b8 48 c7 c7 cd 32 73 a5 c6 05 99 c1 78 01 01 e8 b4 ab 01 00 <0f> 0b 45 31 ff e9 07 ff ff ff e8 7a a8 01 00 66 2e 0f 1f 84 00 00
[ 5372.455572] RSP: 0018:ffffbd29c2f5f888 EFLAGS: 00010282
[ 5372.455574] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
[ 5372.455575] RDX: 0000000000000007 RSI: 0000000000000082 RDI: ffff945f5e317440
[ 5372.455576] RBP: ffffbd29c2f5f8f0 R08: 00000000000003cd R09: 0000000000000004
[ 5372.455578] R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
[ 5372.455579] R13: 0000000000102000 R14: 0000000000000002 R15: ffffffffc1320800
[ 5372.455582] FS:  00007f3cadc53880(0000) GS:ffff945f5e300000(0000) knlGS:0000000000000000
[ 5372.455583] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 5372.455585] CR2: 00005587ac315ae8 CR3: 0000000c6be4a005 CR4: 00000000003606e0
[ 5372.455586] Call Trace:
[ 5372.455593]  ? _cond_resched+0x19/0x30
[ 5372.455598]  ? __kmalloc+0x180/0x270
[ 5372.455706]  ? nvkm_device_ctor+0x2d8/0x3640 [nouveau]
[ 5372.455714]  ioremap_nocache+0x1a/0x20
[ 5372.455814]  nvkm_device_ctor+0x2d8/0x3640 [nouveau]
[ 5372.455820]  ? do_pci_enable_device+0xd7/0x100
[ 5372.455894]  nvkm_device_pci_new+0x102/0x2d0 [nouveau]
[ 5372.455899]  ? _cond_resched+0x19/0x30
[ 5372.455980]  nouveau_drm_probe+0x5f/0x2e0 [nouveau]
[ 5372.455984]  local_pci_probe+0x48/0x80
[ 5372.455987]  pci_device_probe+0x10f/0x1b0
[ 5372.455991]  really_probe+0xfb/0x3a0
[ 5372.455994]  driver_probe_device+0x5f/0xe0
[ 5372.455997]  device_driver_attach+0x5d/0x70
[ 5372.456000]  __driver_attach+0x8f/0x150
[ 5372.456003]  ? device_driver_attach+0x70/0x70
[ 5372.456006]  bus_for_each_dev+0x7e/0xc0
[ 5372.456009]  driver_attach+0x1e/0x20
[ 5372.456011]  bus_add_driver+0x14f/0x1f0
[ 5372.456014]  driver_register+0x74/0xc0
[ 5372.456016]  ? 0xffffffffc13e1000
[ 5372.456018]  __pci_register_driver+0x57/0x60
[ 5372.456070]  nouveau_drm_init+0x191/0x1000 [nouveau]
[ 5372.456074]  do_one_initcall+0x4a/0x1fa
[ 5372.456077]  ? kfree+0x200/0x220
[ 5372.456080]  ? _cond_resched+0x19/0x30
[ 5372.456082]  ? kmem_cache_alloc_trace+0x163/0x230
[ 5372.456086]  do_init_module+0x62/0x250
[ 5372.456089]  load_module+0x10d4/0x1220
[ 5372.456093]  __do_sys_finit_module+0xbe/0x120
[ 5372.456096]  ? __do_sys_finit_module+0xbe/0x120
[ 5372.456100]  __x64_sys_finit_module+0x1a/0x20
[ 5372.456103]  do_syscall_64+0x5a/0x130
[ 5372.456105]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 5372.456107] RIP: 0033:0x7f3cae1c694d
[ 5372.456110] Code: 00 c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 13 e5 0c 00 f7 d8 64 89 01 48
[ 5372.456111] RSP: 002b:00007ffe064c48d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[ 5372.456114] RAX: ffffffffffffffda RBX: 000056371e9c38b0 RCX: 00007f3cae1c694d
[ 5372.456115] RDX: 0000000000000000 RSI: 00007f3cae0a3cad RDI: 0000000000000010
[ 5372.456116] RBP: 00007f3cae0a3cad R08: 0000000000000000 R09: 000056371e9c38b0
[ 5372.456117] R10: 0000000000000010 R11: 0000000000000246 R12: 0000000000000000
[ 5372.456118] R13: 000056371e9d31b0 R14: 0000000000020000 R15: 000056371e9c38b0
[ 5372.456121] ---[ end trace bf511b76f200d1ae ]---
[ 5372.456130] nouveau: probe of 0000:06:00.0 failed with error -12

Note: I previously tried installing Nvidia drivers, blacklisting nouveau etc, and the only effects I managed to get were an unbootable system -- couldn't get past the login screen anymore (tried reversing installation of the drivers via recovery, but no luck). That's how I ended up switching from PopOS to Ubuntu. But same issue on both.

Edit: here's the output of lspci | grep VGA:

00:02.0 VGA compatible controller: Intel Corporation Device 9b41 (rev 02)
06:00.0 VGA compatible controller: NVIDIA Corporation TU106 [GeForce RTX 2070 Rev. A] (rev a1)
jamalex commented 4 years ago

I was happy my new laptop would be using open source firmware, but this is preventing me from using the eGPU for the work I need to use it for, so at this point I'd very readily flash a standard BIOS on here if it got the eGPU working. Is that a possibility? And is this issue still being looked into?

jamalex commented 4 years ago

I realized I hadn't installed system76-driver after installing Ubuntu 19.10. I did so, and also installed system76-driver-nvidia. I still get the same behavior, but dmesg now gives what looks to be the same output as in the original report (as for nvidia-smi and sudo modprobe nvidia, as well):

[   25.666612] thunderbolt 0-1: new device found, vendor=0x8 device=0x38
[   25.666615] thunderbolt 0-1: Sonnet Technologies, Inc. eGFX Breakaway Box 550
[   25.762535] pcieport 0000:02:01.0: pciehp: Slot(1): Card present
[   25.762539] pcieport 0000:02:01.0: pciehp: Slot(1): Link Up
[   26.102817] pci 0000:04:00.0: [8086:1578] type 01 class 0x060400
[   26.102948] pci 0000:04:00.0: enabling Extended Tags
[   26.103165] pci 0000:04:00.0: supports D1 D2
[   26.103166] pci 0000:04:00.0: PME# supported from D0 D1 D2 D3hot D3cold
[   26.103280] pci 0000:04:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 31.504 Gb/s with 8 GT/s x4 link)
[   26.103467] pcieport 0000:02:01.0: ASPM: current common clock configuration is broken, reconfiguring
[   26.114369] pci 0000:04:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   26.114571] pci 0000:05:01.0: [8086:1578] type 01 class 0x060400
[   26.114715] pci 0000:05:01.0: enabling Extended Tags
[   26.114937] pci 0000:05:01.0: supports D1 D2
[   26.114939] pci 0000:05:01.0: PME# supported from D0 D1 D2 D3hot D3cold
[   26.115222] pci 0000:04:00.0: PCI bridge to [bus 05-24]
[   26.115237] pci 0000:04:00.0:   bridge window [io  0x0000-0x0fff]
[   26.115244] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff]
[   26.115256] pci 0000:04:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[   26.115261] pci 0000:05:01.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[   26.115438] pci 0000:06:00.0: [10de:1f07] type 00 class 0x030000
[   26.115520] pci 0000:06:00.0: reg 0x10: [mem 0x00000000-0x00ffffff]
[   26.115553] pci 0000:06:00.0: reg 0x14: [mem 0x00000000-0x0fffffff 64bit pref]
[   26.115585] pci 0000:06:00.0: reg 0x1c: [mem 0x00000000-0x01ffffff 64bit pref]
[   26.115603] pci 0000:06:00.0: reg 0x24: [io  0x0000-0x007f]
[   26.115622] pci 0000:06:00.0: reg 0x30: [mem 0x00000000-0x0007ffff pref]
[   26.115852] pci 0000:06:00.0: PME# supported from D0 D3hot
[   26.116010] pci 0000:06:00.0: 8.000 Gb/s available PCIe bandwidth, limited by 2.5 GT/s x4 link at 0000:02:01.0 (capable of 126.016 Gb/s with 8 GT/s x16 link)
[   26.116060] pci 0000:06:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[   26.116065] i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=none:owns=io+mem
[   26.116131] pci 0000:06:00.1: [10de:10f9] type 00 class 0x040300
[   26.116184] pci 0000:06:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[   26.116599] pci 0000:06:00.2: [10de:1ada] type 00 class 0x0c0330
[   26.116665] pci 0000:06:00.2: reg 0x10: [mem 0x00000000-0x0003ffff 64bit pref]
[   26.116714] pci 0000:06:00.2: reg 0x1c: [mem 0x00000000-0x0000ffff 64bit pref]
[   26.116913] pci 0000:06:00.2: PME# supported from D0 D3hot
[   26.117071] pci 0000:06:00.3: [10de:1adb] type 00 class 0x0c8000
[   26.117123] pci 0000:06:00.3: reg 0x10: [mem 0x00000000-0x00000fff]
[   26.117400] pci 0000:06:00.3: PME# supported from D0 D3hot
[   26.117836] pci 0000:05:01.0: PCI bridge to [bus 06-24]
[   26.117851] pci 0000:05:01.0:   bridge window [io  0x0000-0x0fff]
[   26.117860] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff]
[   26.117874] pci 0000:05:01.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[   26.117879] pci_bus 0000:06: busn_res: [bus 06-24] end is updated to 06
[   26.117888] pci_bus 0000:05: busn_res: [bus 05-24] end is updated to 06
[   26.117914] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.117917] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.117921] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[   26.117923] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.117928] pci 0000:04:00.0: BAR 13: assigned [io  0x2000-0x2fff]
[   26.117933] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.117935] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.117938] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.117940] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.117943] pci 0000:05:01.0: BAR 13: assigned [io  0x2000-0x2fff]
[   26.117951] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[   26.117954] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[   26.117957] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[   26.117959] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[   26.117962] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[   26.117964] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   26.117967] pci 0000:06:00.0: BAR 6: no space for [mem size 0x00080000 pref]
[   26.117969] pci 0000:06:00.0: BAR 6: failed to assign [mem size 0x00080000 pref]
[   26.117972] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[   26.117974] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[   26.117978] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[   26.117980] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[   26.117983] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[   26.117985] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   26.117988] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[   26.117990] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[   26.117993] pci 0000:06:00.0: BAR 5: assigned [io  0x2000-0x207f]
[   26.118006] pci 0000:05:01.0: PCI bridge to [bus 06]
[   26.118011] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[   26.118040] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[   26.118046] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[   26.118074] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118078] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[   26.118086] pcieport 0000:02:01.0:   bridge window [mem 0xd1000000-0xd17fffff]
[   26.118092] pcieport 0000:02:01.0:   bridge window [mem 0xc1000000-0xd0ffffff 64bit pref]
[   26.118099] PCI: No. 2 try to assign unassigned res
[   26.118104] pcieport 0000:02:01.0: resource 14 [mem 0xd1000000-0xd17fffff] released
[   26.118106] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118116] pcieport 0000:02:01.0: resource 15 [mem 0xc1000000-0xd0ffffff 64bit pref] released
[   26.118118] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118141] pcieport 0000:02:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118144] pcieport 0000:02:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118147] pcieport 0000:02:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.118149] pcieport 0000:02:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118153] pci 0000:04:00.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118156] pci 0000:04:00.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118158] pci 0000:04:00.0: BAR 14: no space for [mem size 0x01800000]
[   26.118160] pci 0000:04:00.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118164] pci 0000:05:01.0: BAR 15: no space for [mem size 0x18000000 64bit pref]
[   26.118167] pci 0000:05:01.0: BAR 15: failed to assign [mem size 0x18000000 64bit pref]
[   26.118169] pci 0000:05:01.0: BAR 14: no space for [mem size 0x01800000]
[   26.118171] pci 0000:05:01.0: BAR 14: failed to assign [mem size 0x01800000]
[   26.118176] pci 0000:06:00.0: BAR 1: no space for [mem size 0x10000000 64bit pref]
[   26.118179] pci 0000:06:00.0: BAR 1: failed to assign [mem size 0x10000000 64bit pref]
[   26.118182] pci 0000:06:00.0: BAR 3: no space for [mem size 0x02000000 64bit pref]
[   26.118184] pci 0000:06:00.0: BAR 3: failed to assign [mem size 0x02000000 64bit pref]
[   26.118186] pci 0000:06:00.0: BAR 0: no space for [mem size 0x01000000]
[   26.118188] pci 0000:06:00.0: BAR 0: failed to assign [mem size 0x01000000]
[   26.118190] pci 0000:06:00.2: BAR 0: no space for [mem size 0x00040000 64bit pref]
[   26.118193] pci 0000:06:00.2: BAR 0: failed to assign [mem size 0x00040000 64bit pref]
[   26.118195] pci 0000:06:00.2: BAR 3: no space for [mem size 0x00010000 64bit pref]
[   26.118197] pci 0000:06:00.2: BAR 3: failed to assign [mem size 0x00010000 64bit pref]
[   26.118200] pci 0000:06:00.1: BAR 0: no space for [mem size 0x00004000]
[   26.118202] pci 0000:06:00.1: BAR 0: failed to assign [mem size 0x00004000]
[   26.118204] pci 0000:06:00.3: BAR 0: no space for [mem size 0x00001000]
[   26.118207] pci 0000:06:00.3: BAR 0: failed to assign [mem size 0x00001000]
[   26.118210] pci 0000:05:01.0: PCI bridge to [bus 06]
[   26.118216] pci 0000:05:01.0:   bridge window [io  0x2000-0x2fff]
[   26.118246] pci 0000:04:00.0: PCI bridge to [bus 05-06]
[   26.118251] pci 0000:04:00.0:   bridge window [io  0x2000-0x2fff]
[   26.118315] pcieport 0000:02:01.0: PCI bridge to [bus 04-24]
[   26.118322] pcieport 0000:02:01.0:   bridge window [io  0x2000-0x3fff]
[   26.118391] pcieport 0000:04:00.0: enabling device (0000 -> 0001)
[   26.119436] pcieport 0000:05:01.0: enabling device (0000 -> 0001)
[   26.120527] pci 0000:06:00.1: D0 power state depends on 0000:06:00.0
[   26.121204] snd_hda_intel 0000:06:00.1: Disabling MSI
[   26.121218] snd_hda_intel 0000:06:00.1: Handle vga_switcheroo audio client
[   26.121297] pci 0000:06:00.2: D0 power state depends on 0000:06:00.0
[   26.123177] xhci_hcd 0000:06:00.2: init 0000:06:00.2 fail, -16
[   26.123286] xhci_hcd: probe of 0000:06:00.2 failed with error -16
[   26.123336] pci 0000:06:00.3: D0 power state depends on 0000:06:00.0
[   26.123373] snd_hda_intel 0000:06:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[   26.123374] snd_hda_intel 0000:06:00.1: ioremap error
[   26.125087] nvidia-gpu 0000:06:00.3: pcim_iomap failed
[   26.125723] IPMI message handler: version 39.2
[   26.125862] nvidia-gpu: probe of 0000:06:00.3 failed with error -12
[   26.129362] ipmi device interface
[   26.691416] nvidia: module license 'NVIDIA' taints kernel.
[   26.691418] Disabling lock debugging due to kernel taint
[   26.717409] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   26.721524] nvidia 0000:06:00.0: enabling device (0000 -> 0001)
[   26.722190] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   26.722191] NVRM: The system BIOS may have misconfigured your GPU.
[   26.722198] nvidia: probe of 0000:06:00.0 failed with error -1
[   26.722246] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   26.722247] NVRM: None of the NVIDIA devices were initialized.
[   26.726291] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   27.581741] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   27.582647] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   27.582650] NVRM: The system BIOS may have misconfigured your GPU.
[   27.582657] nvidia: probe of 0000:06:00.0 failed with error -1
[   27.582722] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   27.582723] NVRM: None of the NVIDIA devices were initialized.
[   27.583023] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   28.108225] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   28.109029] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   28.109032] NVRM: The system BIOS may have misconfigured your GPU.
[   28.109040] nvidia: probe of 0000:06:00.0 failed with error -1
[   28.109101] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   28.109102] NVRM: None of the NVIDIA devices were initialized.
[   28.109976] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   28.656604] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   28.657531] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   28.657533] NVRM: The system BIOS may have misconfigured your GPU.
[   28.657541] nvidia: probe of 0000:06:00.0 failed with error -1
[   28.657618] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   28.657619] NVRM: None of the NVIDIA devices were initialized.
[   28.658195] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   29.229447] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   29.231824] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   29.231825] NVRM: The system BIOS may have misconfigured your GPU.
[   29.231832] nvidia: probe of 0000:06:00.0 failed with error -1
[   29.231867] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   29.231868] NVRM: None of the NVIDIA devices were initialized.
[   29.232192] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   29.962784] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   29.963413] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   29.963415] NVRM: The system BIOS may have misconfigured your GPU.
[   29.963422] nvidia: probe of 0000:06:00.0 failed with error -1
[   29.963463] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   29.963464] NVRM: None of the NVIDIA devices were initialized.
[   29.965017] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   30.509769] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   30.510971] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   30.510973] NVRM: The system BIOS may have misconfigured your GPU.
[   30.510979] nvidia: probe of 0000:06:00.0 failed with error -1
[   30.511014] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   30.511015] NVRM: None of the NVIDIA devices were initialized.
[   30.512141] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   31.064048] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   31.064708] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   31.064711] NVRM: The system BIOS may have misconfigured your GPU.
[   31.064718] nvidia: probe of 0000:06:00.0 failed with error -1
[   31.064765] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   31.064766] NVRM: None of the NVIDIA devices were initialized.
[   31.065704] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   31.936015] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   31.937372] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   31.937373] NVRM: The system BIOS may have misconfigured your GPU.
[   31.937382] nvidia: probe of 0000:06:00.0 failed with error -1
[   31.937422] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   31.937423] NVRM: None of the NVIDIA devices were initialized.
[   31.938639] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   32.164701] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   32.166055] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   32.166057] NVRM: The system BIOS may have misconfigured your GPU.
[   32.166064] nvidia: probe of 0000:06:00.0 failed with error -1
[   32.166104] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   32.166105] NVRM: None of the NVIDIA devices were initialized.
[   32.166628] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
[   32.818358] nvidia-nvlink: Nvlink Core is being initialized, major device number 237
[   32.819779] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
               NVRM: BAR0 is 0M @ 0x0 (PCI:0000:06:00.0)
[   32.819781] NVRM: The system BIOS may have misconfigured your GPU.
[   32.819788] nvidia: probe of 0000:06:00.0 failed with error -1
[   32.819830] NVRM: The NVIDIA probe routine failed for 1 device(s).
[   32.819831] NVRM: None of the NVIDIA devices were initialized.
[   32.820123] nvidia-nvlink: Unregistered the Nvlink Core, major device number 237
jackpot51 commented 4 years ago

We are working on new firmware to address this issue

jackpot51 commented 4 years ago

New firmware will be released within the next week that will allow using NVIDIA GPUs for CUDA and PCIE passthrough. Thunderbolt security is enabled by default and this makes using NVIDIA GPUs for graphics require custom configuration because the graphics stack loads before boltd authorizes the eGPU.

vthg2themax commented 4 years ago

Does this disable Thunderbolt security, and make me more vulnerable to untrusted thunderbolt devices? Or do we get configuration options available in coreboot?

On January 16, 2020 4:18:08 PM Jeremy Soller notifications@github.com wrote:

New firmware will be released within the next week that will allow using NVIDIA GPUs for CUDA and PCIE passthrough. Thunderbolt security is enabled by default and this makes using NVIDIA GPUs for graphics require custom configuration because the graphics stack loads before boltd authorizes the eGPU. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

tkolo commented 4 years ago

Is the fix NVIDIA specific?

jackpot51 commented 4 years ago

The only available option is to have Thunderbolt security enabled. The fix was to increase hotplug memory for the Thunderbolt bridge and should not be specific to NVIDIA, but the test was to ensure that the RTX 2080 Ti would be correctly initialized by the NVIDIA driver.

vthg2themax commented 4 years ago

Apologies for not understanding the response, but it sounds like thunderbolt security will remain securely the same, however a lower level change to hotplug memory will solve this issue, right? Thanks again by the way! On January 16, 2020 6:51:17 PM Jeremy Soller notifications@github.com wrote:

The only available option is to have Thunderbolt security enabled. The fix was to increase hotplug memory for the Thunderbolt bridge and should not be specific to NVIDIA.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jackpot51 commented 4 years ago

That's right

tkolo commented 4 years ago

This now works for me perectly after BIOS update. Thanks guys!

zlgonzalez commented 4 years ago

Hi, Can someone tell me what specific egpu model would work now that I've updated my BIOS on the Darter Pro?

Thanks, Ron

vthg2themax commented 4 years ago

I have updated the firmware of my coreboot Galago Pro, and cannot get my AMD RX 5700xt in my Sonnet 650 egpu working still. I can add logs if needed, or if you prefer to continue troubleshooting only with support.

tkolo commented 4 years ago

I have Darter Pro (darp6) with Sonnet 650 and tested with borrowed AMD R9 290X. I've ordered AMD RX 5700XT now, we'll see how it works.

vthg2themax commented 4 years ago

I also popped in another hard drive and installed Windows 10, and installed the drivers and rebooted. The device in device manager came up with a code 12 leading me to believe that there could still be another issue with the firmware.

vthg2themax commented 4 years ago

I swapped out the GPU to an AMD rx560 and it also gives me a code 12 error saying there are not enough resources on the device. I do some web programming for my main job, so if you think this type of problem is something I could contribute to, please feel free to let me know where to start. I really believe in the idea of coreboot, and want to see coreboot achieve the same functionality of the out of box BIOS the device originally had.

tkolo commented 4 years ago

I'm sorry if I'm asking a stupid question, but did you also test eGPU with different laptop? Perhaps that's what's broken.

vthg2themax commented 4 years ago

Not a stupid question at all! Unfortunately, this laptop was my first with eGPU capability, so I do not currently have another to test with. I guess this is why being a trailblazer is so difficult. I will have to see if I can get ahold of another one to use to help troubleshoot this, but it will be a while I imagine.

jackpot51 commented 4 years ago

We have AMD RX 5700XT's to test with, I will try it as well. The fix was to increase the amount of hotplug memory reserved for the Thunderbolt bridge and I found the amount required to run the RTX 2080 Ti, but perhaps AMD GPUs need even more hotplug memory reserved.

We expect to fully support all modern GPUs over Thunderbolt with coreboot.

jackpot51 commented 4 years ago

I have changed the title to remove NVIDIA since that portion is solved. We do not need any customers to investigate AMD graphics card issues as we will investigate them internally.

vthg2themax commented 4 years ago

Answering on a Sunday? Wow! Thank you for your dedication! I just figured I would offer to help, because this is a public project with positive implications for personal freedom on an end user's device.

tkolo commented 4 years ago

My radeon 5700xt arrived today and indeed it doesn't work. You guys mentioned you'll test it as well, so I'll just wait.

tkolo commented 4 years ago

Actually never mind, loading amdgpu driver with vm_update_mode=3 fixed the issue :)

jackpot51 commented 4 years ago

I wonder why that would be required

jackpot51 commented 4 years ago
parm:           vm_update_mode:VM update using CPU (0 = never (default except for large BAR(LB)), 1 = Graphics only, 2 = Compute only (default for LB), 3 = Both (int)
vthg2themax commented 4 years ago

Is that a setting in pop os that needs to be modified?

jackpot51 commented 4 years ago

Perhaps, I still have more investigation to do

tkolo commented 4 years ago

My initial source for this was https://wiki.archlinux.org/index.php/AMDGPU#Freezes_with_%22[drm]_IP_block:gmc_v8_0_is_hung!%22_kernel_error I didn't get exactly this error but what I had was also about something IP blocks so I decided it's wroth a try.

vthg2themax commented 4 years ago

The notice about using the CPU to do some of the work instead of the GPU is definitely kind of a downside to this workaround, but at least it works. I just wish I knew how to help with this, I'm not very familiar with SDMA and hotplug memory type of programming pieces. Thanks for posting that tkolo!

vthg2themax commented 4 years ago

So, I aplogize for being such a novice about this, but since we are using pop os, we would need to use the kernel stub command for this then, correct?

ahoneybun commented 4 years ago

Correct.

sudo kernelstub -a "option"

tkolo commented 4 years ago

Unfortunately since kernel 5.5 I'm unable to get this model to work at all. There's a null dereference exception somewhere deep within amdgpu's code that I'm not sure how to workaround. Fortunately for me I'm not using amdgpu anymore anyway becasue on 5.4 while it worked, I had a lot of issues with reverse PRIME mode, so instead I opted in for vfifo and passthrough to windows VM, which works perfectly. I'd report it to kernel developers but truth be told I don't even know where/how.

vthg2themax commented 4 years ago

Thanks for the update tkolo! Unfortunately for me, I am still stuck with this GPU. Is there a guide for creating my own firmware and loading it to test any changes? I would like to see if perhaps doubling whatever the memory hotplug change was seems to do anything. I really don't want to have to get another GPU, as AMD has non proprietary drivers which is a win for the open source community I think.

vthg2themax commented 4 years ago

I tried following some guides to hooking up the egpu to a windows vm, would you be willing share some tips on how you did it with this laptop with coreboot firmware?

tkolo commented 4 years ago

Took me longer than expected, but there you go https://gist.github.com/tkolo/b1f24130efec706bc66205badbca85cc

vthg2themax commented 4 years ago

Thanks tkolo, I will test it out and report back on my results!

vthg2themax commented 4 years ago

My initial source for this was https://wiki.archlinux.org/index.php/AMDGPU#Freezeswith%22[drm]_IP_block:gmc_v8_0_is_hung!%22_kernel_error I didn't get exactly this error but what I had was also about something IP blocks so I decided it's wroth a try.

I did sudo kernelstub -a "amdgpu.vm_update_mode=3" and rebooted, and I was still getting the error message in 19.10 pop_os dmesg log. Was that what you were getting even when it worked?

vthg2themax commented 4 years ago

I am still working through your excellent guide @tkolo , and am posting revisions here: https://gist.github.com/vthg2themax/893319aec795ff6b242d19c8acb07410 Please let me know if I was supposed to help edit your gist file instead. I am not certain of the original intent.

tkolo commented 4 years ago

I only wrote it as gist because I feel comfortable with markdown and it was the first markdown-enabled paste service that came to my mind. Feel free to maintain it though if you want to re-use it somewhere else.

EDIT: Actually thinking about it, it's kind of a pity that neither Pop_OS! nor System76 have a wiki for such stuff (at least I couldn't find one)

vthg2themax commented 4 years ago

I shall leave it out there, and maintain it as best as I can. If it would be of use to anyone else on the web, it will be worth preserving. System76 / Pop_OS! seems like some good people, but did not anticipate this type of usage when they starting working on the coreboot, I suppose? Either way perhaps @jackpot51 could shed some light on whether the in-progress guide could be put somewhere on their site, or if they are already working on something like this.

tkolo commented 4 years ago

Also answering your earlier question (sorry, I forgot about it) actually neither putting vm_update_mode in kernel parameters nor in modprobe.d worked for me, I always had to manually unload amdgpu with rmmod and reload it again with modprobe amdgpu vm_update_mode=3. Not sure if that behavior was because of my misconfiguration or some deeper driver issue, I don't really care ever since I switched to vfio

vthg2themax commented 4 years ago

@jackpot51 Can you comment on how things are going with this?

jackpot51 commented 4 years ago

I have not figured out why AMD GPUs are not working while NVIDIA GPUs are. It doesn't appear that increasing the reserved memory matters.

vthg2themax commented 4 years ago

Do you know who I could ask around to try to get this working? Or anything I could look into to help out with figuring this out?

jackpot51 commented 4 years ago

There may be a maintainer listed for the amdgpu module who would be able to provide more information on the error

vthg2themax commented 4 years ago

Thank you!

On March 17, 2020 9:34:55 AM Jeremy Soller notifications@github.com wrote:

There may be a maintainer listed for the amdgpu module who would be able to provide more information on the error — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

vthg2themax commented 4 years ago

After contacting the amazingly helpful, and prompt, folks with the amdgpu driver, they had me test the card on a standard desktop, and with a newer kernel to see any differences. The card worked fine on a standard dekstop, however on the newer kernel still had an error. The error message specifically mentions corruption and the helpful hint they have given is the following statement: 'The discovery data is region of vram that is populated at device power up by a micro-controller on the GPU. It contains information about the device that the driver needs to load properly.' So, I think that means that this issue will need to be solved in coreboot. I will read more about how to get setup with coreboot, but wanted to know of there is anything special I need to debug and deploy coreboot onto my laptop? I would like to make this firmware better. @jackpot51