openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
64 stars 11 forks source link

Broken PCI BusID parsing on machines with several domains #88

Closed Kalanyr closed 1 year ago

Kalanyr commented 1 year ago

When using the BusID X:X:X syntax SUSEPrime is generating an actional x conf file with BusID 0:0:2 when the correct BusID is 0:2:0 (ie the standard one for Intel Integrated GPUs).

SUSEPrime and X.org actually both behave pretty strangely with the Optimus setup on this laptop generally. But I figure one problem at a time.

sndirsch commented 1 year ago

Hmm. Could you please provide the output of /sbin/lspci and the generated x conf file? 0:0:2 looks pretty common to me. Are you sure changing it to 0:2:0 fixes the issue?

Kalanyr commented 1 year ago

Lspci can be found at: https://www.pastebin.com/y2DNpp2U

Generated 90-intel.conf https://www.pastebin.com/LxVjkJZC

Source xorg-intel.conf https://www.pastebin.com/N4w8HDKN

Changing the BusID to 0:2:0 does fix the initial error but X.org does all sorts of weird stuff at the point including bringing the Nvidia card (1:0:0) and dies with a pixmap error.

On Mon, 24 Oct 2022, 00:59 Stefan Dirsch, @.***> wrote:

Hmm. Could you please provide the output of /sbin/lspci and the generated x conf file? 0:0:2 looks pretty common to me. Are you sure changing it to 0:2:0 fixes the issue?

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/88#issuecomment-1288132054, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ332VBHKNGYHOWYN5YMDWEVHETANCNFSM6AAAAAARMJ6RUM . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

0000:00:02.0 VGA compatible controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c) --> PCI:0:0:2 There is nothing wrong with that. Config file looks also fine.

sndirsch commented 1 year ago

I guess with the wrong PCI ID (you're trying to configure) Xserver tries to use only NVIDIA card and then fails for some reason. I would need the logfiles /var/log/Xorg.0.log for both cases.

Kalanyr commented 1 year ago

Am I misreading something then?

  1. 0000:01:00.0 VGA compatible controller: NVIDIA Corporation GA103M [GeForce RTX 3080 Ti Mobile] (rev a1)

Matches 1:0:0 (correctly)

If Intel matches 0:0:2 shouldn't the Nvidia march 0:1:0 ?

On Mon, 24 Oct 2022, 02:48 Stefan Dirsch, @.***> wrote:

0000:00:02.0 VGA compatible controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c) --> PCI:0:0:2 There is nothing wrong with that. Config file looks also fine.

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/88#issuecomment-1288152603, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ332YMGXD3ZWYLAONQM3WEVT5XANCNFSM6AAAAAARMJ6RUM . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

Hmm. Seems the lspci output format changed, which broke parsing. :-( Are you using Tumbleweed? You're right. It should be PCI:0:2:0 and PCI:1:0:0.

sndirsch commented 1 year ago

line=$(/sbin/lspci | grep "VGA compatible controller: Intel") echo $line| cut -f 1 -d ' ' | sed -e 's/./:/g;s/:/ /g' | awk -Wposix '{printf("PCI:%d:%d:%d\n","0x" $1, "0x" $2, "0x" $3 )}'

With 0000:00:02.0 VGA compatible controller: Intel Corporation Alder Lake-HX GT1 [UHD Graphics 770] (rev 0c)

this results in PCI:0:0:2 instead of PCI:0:2:0 with 00:02.0 VGA compatible controller: Intel Corporation ... (I see this on Leap 15.3/15.4)

Kalanyr commented 1 year ago

Yes, I'm using Tumbleweed.

On Mon, 24 Oct 2022, 03:00 Stefan Dirsch, @.***> wrote:

Hmm. Seems the lspci output format changed, which broke parsing. :-( Are you using Tumbleweed? You're right. It should be 0:2:0 and 1:0:0.

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/88#issuecomment-1288154694, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ337ZWUXTLTFLUPI2XTTWEVVLRANCNFSM6AAAAAARMJ6RUM . You are receiving this because you authored the thread.Message ID: @.***>

bubbleguuum commented 1 year ago

Looking into it, Kalanyr PC has two domains (0000 and 10000) thus lspci prefixes the devices with the domains:

0000:00:00.0 Host bridge: Intel Corporation Device 4637 (rev 02)
...
10000:e0:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02)
....

Most PCs I suppose only have one domain (domain 0) and in that case lspci omits it in its output (quoting the man for lspci: By default, lspci suppresses them on machines which have only domain 0).

Thus it is a parsing bug when there are more than 1 domain.

These modifications should fix it:

line=$(lspci -D | grep "$lspci_line" | head -1)
...
card_busid=$(echo $line | cut -f 1 -d ' ' | sed -e 's/\./:/g;s/:/ /g' | awk -Wposix '{printf("PCI:%d:%d:%d\n","0x" $2, "0x" $3, "0x" $4 )}')

This adds the -D option to lspci to always output the domain and shift the printf arguments by one to $2, $3, $4 to take into account the new domain field in the lspci output.

sndirsch commented 1 year ago

@bubbleguuum Thanks. Good catch. Didn't know this domain trick. I would have used $(NF-2), $(NF-1) and $NF, but II like the -D option much better.

sndirsch commented 1 year ago

I think I have seen several domains on big server machines before, but not so much on Laptops. ;-)

sndirsch commented 1 year ago

Now fixed in git and made an new release 0.8.9 with this fix. Also updated the suse-prime package and submitted for Tumbleweed.