openSUSE / SUSEPrime

Provide nvidia-prime like package for openSUSE
64 stars 11 forks source link

Setting prime-select to intel does not stop X bringing the Nvidia card up (even with bbswitch configured) #90

Closed Kalanyr closed 1 year ago

Kalanyr commented 1 year ago

On my PC selecting prime-select to intel results in X still bringing the Nvidia card up and then crashing with a pixmap failure. Log here: pastebin.com/9GVjExn9

I have bbswitch installed. get-current shows that and indicates the Nvidia driver is on after X comes up.

Interestingly if I configure offload-set to intel , and use offload, everything does work correctly.

Perhaps the pure intel setting needs to blacklist the nvidia card somehow ?

(Intel2 does work (in the sense that X starts) but there's a lot of graphical glitches )

sndirsch commented 1 year ago

On my PC selecting prime-select to intel results in X still bringing the Nvidia card up and then crashing with a pixmap failure. Log here: pastebin.com/9GVjExn9

That's not a Xserver crash. Seems for both GPUs modeset driver is being tried to be used. Then for nvidia GPU it tries to initialize GLAMOR (2D acceleration via OpenGL) via the "nouveau" GL driver, which probably fails because nvidia driver is loaded.

I have bbswitch installed. get-current shows that and indicates the Nvidia driver is on after X comes up.

Ok. The message about bbswitch installed/not installed is a bit misleading since you need it only when you want to completely disable NVIDIA GPU. Which you don't want in nvidia or offload mode.

Interestingly if I configure offload-set to intel , and use offload, everything does work correctly.

Ok. Maybe just use this mode.

Perhaps the pure intel setting needs to blacklist the nvidia card somehow ?

Which is done if you install bbswitch package.

(Intel2 does work (in the sense that X starts) but there's a lot of graphical glitches )

Intel2 uses "i915" instead of "modeset" driver. Not recommended any longer.

Kalanyr commented 1 year ago

I do have the bbswitch package installed though and the suse-prime config stuff (modprobe etc) to enable it is all in the right place.

Just doesn't seem to work when prime-select intel is used.

I have no idea why modesetting decides to try to bring the Nvidia card up.

Getting the intel setting to work is 50% academic interest and 50% wanting to have a minimal power config for me.

On Wed, 26 Oct 2022, 21:17 Stefan Dirsch, @.***> wrote:

On my PC selecting prime-select to intel results in X still bringing the Nvidia card up and then crashing with a pixmap failure. Log here: pastebin.com/9GVjExn9

That's not a Xserver crash. Seems for both GPUs modeset driver is being tried to be used. Then for nvidia GPU it tries to initialize GLAMOR (2D acceleration via OpenGL) via the "nouveau" GL driver, which probably fails because nvidia driver is loaded.

I have bbswitch installed. get-current shows that and indicates the Nvidia driver is on after X comes up.

Ok. The message about bbswitch installed/not installed is a bit misleading since you need it only when you want to completely disable NVIDIA GPU. Which you don't want in nvidia or offload mode.

Interestingly if I configure offload-set to intel , and use offload, everything does work correctly.

Ok. Maybe just use this mode.

Perhaps the pure intel setting needs to blacklist the nvidia card somehow ?

Which is done if you install bbswitch package.

(Intel2 does work (in the sense that X starts) but there's a lot of graphical glitches )

Intel2 uses "i915" instead of "modeset" driver. Not recommended any longer.

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1291876338, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ3334ABO4DW2KFA4PFXTWFEHLPANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

I do have the bbswitch package installed though and the suse-prime config stuff (modprobe etc) to enable it is all in the right > place. Just doesn't seem to work when prime-select intel is used.

Hmm ...

I have no idea why modesetting decides to try to bring the Nvidia card up.

Yeah, me not either. Oh, I was wrong. GLAMOR even could be initialized on the nvidia GPU. The pixmap error cames later

[...] (II) modeset(G0): [DRI2] VDPAU driver: nouveau [ 142.749] (EE) modeset(G0): Failed to create pixmap

No idead what this means.

Getting the intel setting to work is 50% academic interest and 50% wanting to have a minimal power config for me.

Ok.

Kalanyr commented 1 year ago

I've done a little bit more testing on this and adding "AutoAddGPU" "false" to the ServerLayout successfully stops X bringing the Nvidia card up which allows the x session to start. Intel then ends up as the server glx vendor and Nvidia-smi reports the Nvidia card as off. xrandr only shows the iGPU as a provider

However lsmod shows all the Nvidia modules and bbswitch are loaded. And the offload environmental variables appear to work. Vkcube reports running on the dedicated GPU as well.

My best guess then is that the blacklisting just isn't working. I'll need to poke further and see if I can work out why.

On Wed, 26 Oct 2022, 22:20 Stefan Dirsch, @.***> wrote:

I do have the bbswitch package installed though and the suse-prime config stuff (modprobe etc) to enable it is all in the right > place. Just doesn't seem to work when prime-select intel is used.

Hmm ...

I have no idea why modesetting decides to try to bring the Nvidia card up.

Yeah, me not either. Oh, I was wrong. GLAMOR even could be initialized on the nvidia GPU. The pixmap error cames later

[...] (II) modeset(G0): [DRI2] VDPAU driver: nouveau [ 142.749] (EE) modeset(G0): Failed to create pixmap

No idead what this means.

Getting the intel setting to work is 50% academic interest and 50% wanting to have a minimal power config for me.

Ok.

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1291948783, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ332ZLTJTNPL3PVRTWTLWFEOX5ANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

No idea. Seems the unloading of the nvidia kernel modules in "intel" mode fails for some reason. Need to test all these modes again. Mabye it got broken at some point ...

sndirsch commented 1 year ago

Just noticed that my Dell Precision 5520 lost (!?!) it's nVidia GPU. And there's no way to re-enable it in the BIOS. And according to Google results this appears to be a known issue. I have another Optimus laptop for testing,but I'm afraid I won't get it back weeks or even months from now. :-( That's frustrating ...

Kalanyr commented 1 year ago

Bummer.

I'm happy to help with testing if you'd like.

On Fri, 28 Oct 2022, 00:28 Stefan Dirsch, @.***> wrote:

Just noticed that my Dell Precision 5520 lost (!?!) it's nVidia GPU. And there's no way to re-enable it in the BIOS. And according to Google results this appears to be a known issue. I have another Optimus laptop for testing,but I'm afraid I won't get it back weeks or even months from now. :-( That's frustrating ...

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1293609118, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ33Z4CGHG5BPAHP3KBJLWFKGP3ANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

Finally I got my Optimus machine back. And indeed I could reproduce that issue. The issue was, that during installation in %post of nvidia-glG06 prime-select was already running. First it tried "prime-select nvidia" and if this failed it ran "prime-select intel". Unfortunately "prime-select nvidia" almost always failed due to Secureboot enabled or the driver getting updated.

I changed this now, so this no longer happens. But this also means you need to run "prime-select nvidia" after reboot manually.

Fri Nov 25 12:46:13 UTC 2022 - Stefan Dirsch <sndirsch@suse.com>

- %post of nvidia-glG06
  * 'prime-select <nvidia|offload>'
    Don't try to run it during driver update or in secureboot since
    it will fail anyway when executing 
    'nvidia-xconfig --query-gpu-info'. This tool is driver version
    specific and needs the appropriate driver kernel modules loaded,
    which is not possible during driver update (old modules still
    loaded) and in secureboot mode (modules can't be loaded without
    the signing key registered). (boo#1205642)

This is fixed with the nvidia packages in NVIDIA repos since yesterday morning. Closing as fixed.

sndirsch commented 1 year ago

But please feel free to verify by updating your nvidia packages! :-)

Kalanyr commented 1 year ago

Thank you. I'm busy this weekend but will check on Tuesday.

On Fri, 2 Dec 2022, 00:37 Stefan Dirsch, @.***> wrote:

But please feel free to verify by updating your nvidia packages! :-)

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1333863131, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ33ZQC4AY5NYUU26QNELWLCZ25ANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

Kalanyr commented 1 year ago

I don't have time to dig into this right now but unfortunately my brief test shows setting prime-select next-boot to Intel and rebooting, still brings the Nvidia card up which causes X to try and do stuff with it and then fall over.

On Fri, 2 Dec 2022, 02:44 Kalanyr, @.***> wrote:

Thank you. I'm busy this weekend but will check on Tuesday.

On Fri, 2 Dec 2022, 00:37 Stefan Dirsch, @.***> wrote:

But please feel free to verify by updating your nvidia packages! :-)

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1333863131, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ33ZQC4AY5NYUU26QNELWLCZ25ANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

I haven't tested prime-select next-booot intel

sndirsch commented 1 year ago

Also are you testing prime-select intel. For reproduction. Do you bbswitch installed or not?

Kalanyr commented 1 year ago

I have bbswitch installed. And my prime select boot is set to Nvidia.

I tested prime-select intel , followed by logout (which immediately fell over). Then I tried prime-select next-boot intel , which also fell over. I didn't try prime-select boot intel since given the failure of the logout I was suspecting it would fall over but now I know they use separate logic I'll give it a try next time I get a chance.

On Tue, 6 Dec 2022, 20:41 Stefan Dirsch, @.***> wrote:

Also are you testing prime-select intel. For reproduction. Do you bbswitch installed or not?

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1339123543, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ332QCLW546ZXFN6G27TWL4J6LANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

Ok. Please, once you can reproduce let me know how exactly it is reproducable. In which order to run the prime-select commands.

Kalanyr commented 1 year ago

Going to walk through what happens after a dup and attempting to switch to use only the intel driver.

It's currently 10:30 AM (UTC+10) on 2022/12/14 Mux is set to optimus mode (so both Intel and Nvidia cards are visible at hardware, the other Mux setting is Nvidia only). My current setup has prime-select boot set to nvidia. I'm running Opensuse Tumbleweed and have the Nvidia repository included and configured.

I have just performed a dup and am about to restart. This should work fine historically worth doing as a baseline.

Kalanyr commented 1 year ago

Restarted and everything works as expected.

Output of get-current: Driver configured: nvidia NVIDIA modules are loaded

Now using prime-select intel prime-select boot intel

I expect the logout changeover to fail (based on past experience), and will restart afterwards if that is the case, which should load only the intel drivers and have bbswitch active. In the pretty likely scenario that also fails my next update will be after I swap back to nvidia and reboot. Here we go. I'll make copies of the get-current output and the x log and upload those in the case of failures.

And logging out.

Kalanyr commented 1 year ago

As expected logging out failed (X appears to freak out about having 2 GPUs / Displays visible and no directive for which one to use, see past note on "AutoAddGPU" "false" allowing X to start but still having both cards active)

Output of get-current suseprime-intel-logout.txt Xorg log intel-logout-Xorg.0.log

Restarting also failed (Same problem)

Output of get-current suseprime-intel-restart.txt

Xorg log intel-restart-Xorg.0.log

Setting boot to nvidia and restarting worked.

So in summary: For whatever reason setting prime-select to intel doesn't stop the Nvidia card being powered on or the modules loaded, X then sees both GPUs, brings both up and then freaks out.

sndirsch commented 1 year ago

Reopening. I'll try to have a look into that this week.

sndirsch commented 1 year ago

I couldn't reproduce this issue, but I didn't see any regression in nvidia, intel and offload modes after adding

Section "ServerFlags"
  Option   "AutoAddGPU" "false"
EndSection

to xorg-amd.conf, xorg-intel-intel.conf and xorg-intel.conf in /usr/share/prime directory either.

So I can just add this section.

Kalanyr commented 1 year ago

MSI does appear to have done something non-standard with the GPU setup on this model (NVIDIA Experience doesn't correctly detect the display resolution in Hybrid mode either), so not particularly surprising that it's not generally replicable.

As fixes goes, it's definitely better than the crash even if it does negate a chunk of the benefit of running in Intel only mode since the NVIDIA GPU will remain active and drawing power.

Thank you for investigating this.

On Wed, 4 Jan 2023, 00:09 Stefan Dirsch, @.***> wrote:

I couldn't reproduce this issue, but I didn't see any regression in nvidia, intel and offload modes after adding

Section "ServerFlags" Option "AutoAddGPU" "false" EndSection

to xorg-amd.conf, xorg-intel-intel.conf and xorg-intel.conf in /usr/share/prime directory either.

So I can just add this section.

— Reply to this email directly, view it on GitHub https://github.com/openSUSE/SUSEPrime/issues/90#issuecomment-1369810924, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABKZ337C2IPQCI735O2NGF3WQQXKRANCNFSM6AAAAAAROZM5LY . You are receiving this because you authored the thread.Message ID: @.***>

sndirsch commented 1 year ago

Just made a new release 0.8.13 with the fix applied. Also upated the suse-prime package in X11:XOrg and submitrequested for factory/Tumbleweed. Closing.