Implement and document PRIME Render Offload and DynamicPowerManagement

sndirsch commented 4 years ago

09-nvidia-blacklist.conf -> 09-nvidia-modprobe-bbswitch-G04.conf

Added new files for using DynamicPowerManagement option of 435.xx driver

09-nvidia-modprobe-pm-G05.conf driver option for DynamicPowerManagement
90-nvidia-dracut-G05.conf kernel modules must not be in initrd any longer; otherwise udev rules below don't apply for some reason
90-nvidia-udev-pm-G05.rules use udev rules as workaround, since NVIDIA Audio/USB currently cannot be enabled at the same time as NVIDIA GPU DynamicPowerManagement

Add "AllowNVIDIAGPUScreens" option to intel X configs needed for NVIDIA's PRIME Render Offload feature.

Add instrcutions how to use PRIME Render Offload and DynamicPowerManagement with current 435.xx driver.

prime-select.sh: No longer run update-alternatives with current 435.xx driver. NVIDIA's glx extension is loaded on demand by their X driver meanwhile.

Rush commented 4 years ago

Interesting changes. Is SUSEPrime still relevant with the 435.x driver? Shouldn't GPU offloading make more sense to use the default?

sndirsch commented 4 years ago

@Rush Good question. I haven't tried PRIME Render Offload yet. And it isn't supported by 390.xxxx legacy driver, which is also supported by suse-prime. And you probably would need to set the NV_ environment variables globally to completely replace old suse-prime.

flukejones commented 4 years ago

My comments will follow in a few separated posts as I test things.

09-nvidia-blacklist.conf -> 09-nvidia-modprobe-bbswitch-G04.conf

Not sure if this needs to be renamed. Anyone running 435.x drivers and a gfx card that doesn't support fully powering off may still want to use bbswitch support to power it off when using the intel driver. Blacklisting the nvidia modules enables this.

Maybe a section in the Readme about this could help also?

flukejones commented 4 years ago

90-nvidia-dracut-G05.conf kernel modules must not be in initrd any longer; otherwise udev rules below don't apply for some reason

I needed to remove /etc/dracut.conf.d/50-nvidia-default.conf also, or the udev rules wouldn't take place. I thought higher numbered conf overrode early numbered conf?

flukejones commented 4 years ago

In 90-intel.conf xorg config I required adding the following:

Section "Device"
  Identifier "nvidia"
  Driver "nvidia"
  BusID "PCI:1:0:0"
EndSection

to get offload working. This was required or the Nvidia driver wouldn't load. I note that the addition was only required when using this more explicit configuration from suse-prime. If I use a conf with no explicit Device/Screen settings such as the following:

# Replace contents of 90-intel.conf with
Section "ServerLayout"
    Identifier "layout"
    Option "AllowNVIDIAGPUScreens"
EndSection

then the additional Device section for Nvidia is not required.

Not sure how we would want to handle this. It might be that for both GO4 and GO5 drivers the config with only ServerLayout might suffice if xorg has all the required auto-detect settings already?

sndirsch commented 4 years ago

My comments will follow in a few separated posts as I test things.

09-nvidia-blacklist.conf -> 09-nvidia-modprobe-bbswitch-G04.conf

Not sure if this needs to be renamed. Anyone running 435.x drivers and a gfx card that doesn't support fully powering off may still want to use bbswitch support to power it off when using the intel driver. Blacklisting the nvidia modules enables this.

Maybe a section in the Readme about this could help also?

It's really just a filename renaming. I did this since I added 3 more files and wanted to give the filenames a better structure. And this file is still mentioned in the Readme in the section where it is explained how to use bbswitch.

flukejones commented 4 years ago

Final test being switching between graphics. This worked perfectly fine with the above modifications.

sndirsch commented 4 years ago

90-nvidia-dracut-G05.conf kernel modules must not be in initrd any longer; otherwise udev rules below don't apply for some reason

I needed to remove /etc/dracut.conf.d/50-nvidia-default.conf also, or the udev rules wouldn't take place. I thought higher numbered conf overrode early numbered conf?

Well, values of options are overriden, that's true. But in 50-nvidia.conf there is add_drivers+="..." whereas in 90-nvidia-dracut-G05.conf there is omit_drivers+="..". I had hope that this would result in not having kernel modules in initrd, but apparently I was proven wrong. :-(

flukejones commented 4 years ago

Well, values of options are overriden, that's true. But in 50-nvidia.conf there is add_drivers+="..." whereas in 90-nvidia-dracut-G05.conf there is omit_drivers+="..". I had hope that this would result in not having kernel modules in initrd, but apparently I was proven wrong. :-(

Perhaps those lines can be removed in the driver packaging? I've never had issue without them. And looking at how rpmfusion does their driver packaging they exclude the drivers by default.

sndirsch commented 4 years ago

In 90-intel.conf xorg config I required adding the following:
Section "Device"
  Identifier "nvidia"
  Driver "nvidia"
  BusID "PCI:1:0:0"
EndSection
to get offload working. This was required or the Nvidia driver wouldn't load. I note that the addition was only required when using this more explicit configuration from suse-prime. If I use a conf with no explicit Device/Screen settings such as the following:
# Replace contents of 90-intel.conf with
Section "ServerLayout"
    Identifier "layout"
    Option "AllowNVIDIAGPUScreens"
EndSection
then the additional Device section for Nvidia is not required.

Not sure how we would want to handle this. It might be that for both GO4 and GO5 drivers the config with only ServerLayout might suffice if xorg has all the required auto-detect settings already?

Hmm. That's bad. Mabye we should better introduce an extra prime render offload option for prime-select.sh with a special xorg.conf.

sndirsch commented 4 years ago

Final test being switching between graphics. This worked perfectly fine with the above modifications.

Sounds at least promising. Thanks for thoroughful testing!

sndirsch commented 4 years ago

Well, values of options are overriden, that's true. But in 50-nvidia.conf there is add_drivers+="..." whereas in 90-nvidia-dracut-G05.conf there is omit_drivers+="..". I had hope that this would result in not having kernel modules in initrd, but apparently I was proven wrong. :-(

Perhaps those lines can be removed in the driver packaging? I've never had issue without them. And looking at how rpmfusion does their driver packaging they exclude the drivers by default.

Ok. I need to think about it.

simopil commented 4 years ago

I'm going to test nvidia PRIME offload feature with some performance test compared to suse-prime. Nvidia feature sounds awesome

sndirsch commented 4 years ago

I'm going to test nvidia PRIME offload feature with some performance test compared to suse-prime. Nvidia feature sounds awesome

Thanks! That would be great!

sndirsch commented 4 years ago

@Rush @Luke-Nukem @simopil I've added another commit 'Fix NVIDIA PRIME Render Offload ' to this open pull request to address the open issues, i.e. existing 50-nvidia-default.conf from nvidia driver packages and missing nvidia driver snippet in intel X configs for NVIDIA PRIME Render Offload mode. Maybe you could give it another try. Would be great!

simopil commented 4 years ago

PRIME offloading in nvidia drivers seems not works correctly in my config. Nvidia card is on everytime

bubbleguuum commented 4 years ago

From what I understand, it must be on everytime with PRIME offloading since it must be enabled in xorg.conf, thus the nvidia driver loaded. PRIME offloading is for users that want to be able to use NVIDIA rendering for certain programs (without having to restart Xorg), with the Intel driver used for the rest. It is not a replacement for saving power using bbswitch (unless I missed something), still requiring to restart xorg when switching drivers. So this is a different use case

sndirsch commented 4 years ago

From what I understand, it must be on everytime with PRIME offloading since it must be enabled in xorg.conf, thus the nvidia driver loaded. PRIME offloading is for users that want to be able to use NVIDIA rendering for certain programs (without having to restart Xorg), with the Intel driver used for the rest. It is not a replacement for saving power using bbswitch (unless I missed something), still requiring to restart xorg when switching drivers. So this is a different use case

Yes, that's why you usually want to use it together with DynamicPowerManagement option of 435.xx driver ...

sndirsch commented 4 years ago

I've also updated our suse-prime beta package in X11:XOrg with the patches of this pull request.

https://build.opensuse.org/package/show/X11:XOrg/suse-prime-beta

Tue Oct 1 14:16:26 UTC 2019 - Stefan Dirsch sndirsch@suse.com

0001-Implement-and-document-PRIME-Render-Offload-and-Dyna.patch 0002-Fix-NVIDIA-PRIME-Render-Offload.patch
- adds support for NVIDIA PRIME Render Offload of 435.xx/G05 driver
add new config files (modprobe.d/dracut.d/udev.d) to package; regenerate initrd during installation and also during update of nvidia G05 KMP

I hope I can test this tomorrow on my Dell Optimus notebook ...

flukejones commented 4 years ago

From what I understand, it must be on everytime with PRIME offloading since it must be enabled in xorg.conf, thus the nvidia driver loaded. PRIME offloading is for users that want to be able to use NVIDIA rendering for certain programs (without having to restart Xorg), with the Intel driver used for the rest. It is not a replacement for saving power using bbswitch (unless I missed something), still requiring to restart xorg when switching drivers. So this is a different use case

Yes, that's why you usually want to use it together with DynamicPowerManagement option of 435.xx driver ...

It might be worth noting which graphics architectures support full power management in the readme so that users can decide if to use bbswitch or not.

Rush commented 4 years ago

Could the script detect the supported architectures? I bet most users won't have the time/knowledge to do their own research.

flukejones commented 4 years ago

Could the script detect the supported architectures? I bet most users won't have the time/knowledge to do their own research.

I think we probably can, these are the conditions required for the newer power management:

This feature requires system hardware as well as ACPI support (ACPI "_PR0" and "_PR3" methods are needed to control PCIe power). The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series.

This feature requires a Turing or newer GPU.

lspci can get the arch I think, grep something from the following?

01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)

Perhaps NVIDIA Corporation TU1 which aligns with the model numbers.

simopil commented 4 years ago

Best way for < turing would be a "simil-bumblebee" system that uses official nvidia offloading

sndirsch commented 4 years ago

Could the script detect the supported architectures? I bet most users won't have the time/knowledge to do their own research.

I think we probably can, these are the conditions required for the newer power management:

This feature requires system hardware as well as ACPI support (ACPI "_PR0" and "_PR3" methods are needed to control PCIe power). The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series. This feature requires a Turing or newer GPU.

lspci can get the arch I think, grep something from the following?
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)
Perhaps NVIDIA Corporation TU1 which aligns with the model numbers.

Thanks. According to documentation this is only supported on notebooks with Turing GPUs or newer, but AFAICS there are no Turing GPUs available yet for laptops?!? LOL.

sndirsch commented 4 years ago

Best way for < turing would be a "simil-bumblebee" system that uses official nvidia offloading

Not sure what you mean with "simil-bumblebee" here. Could you elaborate?

bubbleguuum commented 4 years ago

Thanks. According to documentation this is only supported on notebooks with Turing GPUs or newer, but AFAICS there are no Turing GPUs available yet for laptops?!? LOL.

There are many laptops with RTX 20x0 models (and equivalent Quadro), thus Turing models

simopil commented 4 years ago

Bumblebee can turn off and on nvidia gpu without restarting display manager with nvidia proprietary driver, I don't now how does it work but it unloads and loads nvidia modules during offload

sndirsch commented 4 years ago

I've also updated our suse-prime beta package in X11:XOrg with the patches of this pull request.

https://build.opensuse.org/package/show/X11:XOrg/suse-prime-beta

Tue Oct 1 14:16:26 UTC 2019 - Stefan Dirsch sndirsch@suse.com
* 0001-Implement-and-document-PRIME-Render-Offload-and-Dyna.patch
  0002-Fix-NVIDIA-PRIME-Render-Offload.patch

  * adds support for NVIDIA PRIME Render Offload of 435.xx/G05 driver

* add new config files (modprobe.d/dracut.d/udev.d) to package;
  regenerate initrd during installation and also during update of
  nvidia G05 KMP
I hope I can test this tomorrow on my Dell Optimus notebook ...

Works for me so far. In old suse-prime "nvidia" mode, but also in "intel" mode using PRIME Render Offload. I can't say whether DynamicPowermanagement does anything. At least it doesn't hurt having it enabled. This is on Dell Precision 5510 with

Intel Corporation HD Graphics 530 (rev 06) [8086:191b] (Skylake still) NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) [10de:13b1]

sndirsch commented 4 years ago

Thanks. According to documentation this is only supported on notebooks with Turing GPUs or newer, but AFAICS there are no Turing GPUs available yet for laptops?!? LOL.

There are many laptops with RTX 20x0 models (and equivalent Quadro), thus Turing models

Indeed, there are GeForce RTX 2060/2070/2080 [MAX-Q], Quadro RTX 4000/5000/6000/8000, Quadro T1000/T2000, Quadro RTX 3000/4000/5000 available as Laptop GPUs.

--> TU102, TU104, TU106 , TU117

sndirsch commented 4 years ago

Bumblebee can turn off and on nvidia gpu without restarting display manager with nvidia proprietary driver, I don't now how does it work but it unloads and loads nvidia modules during offload

Ok. Thanks!

Rush commented 4 years ago

Bumblebee can turn off and on nvidia gpu without restarting display manager with nvidia proprietary driver, I don't now how does it work but it unloads and loads nvidia modules during offload

I think it's because bumblebee renders things in a second invisible X server.

flukejones commented 4 years ago

Could the script detect the supported architectures? I bet most users won't have the time/knowledge to do their own research.

I think we probably can, these are the conditions required for the newer power management:

This feature requires system hardware as well as ACPI support (ACPI "_PR0" and "_PR3" methods are needed to control PCIe power). The necessary hardware and ACPI support was first added in Intel Coffeelake chipset series. Hence, this feature is supported from Intel Coffeelake chipset series. This feature requires a Turing or newer GPU.

lspci can get the arch I think, grep something from the following?
01:00.0 VGA compatible controller: NVIDIA Corporation TU106M [GeForce RTX 2060 Mobile] (rev a1)
Perhaps NVIDIA Corporation TU1 which aligns with the model numbers.
Thanks. According to documentation this is only supported on notebooks with Turing GPUs or newer, but AFAICS there are no Turing GPUs available yet for laptops?!? LOL.

I'm running one right now, an RTX 2060. It's mind-blowing!

flukejones commented 4 years ago

Bumblebee can turn off and on nvidia gpu without restarting display manager with nvidia proprietary driver, I don't now how does it work but it unloads and loads nvidia modules during offload

Although bumblebee works, it's not such a good solution as it is something like a wrapper which intercepts opengl (and vulkan now too?) calls intended for the actual running GPU (say, Intel) and redirects to the Nvidia GPU. There's a fair amount of overhead. In any case, you can uninstall suse-prime and install bbswitch + bumblebee.

simopil commented 4 years ago

I know all bumblebee performance limitation. So bumblebee can unload nvidia module because execute it in a parallel x session? I'm searching a way to unload nvidia modules without restarting x when intel gpu is rendering

flukejones commented 4 years ago

I've also updated our suse-prime beta package in X11:XOrg with the patches of this pull request. https://build.opensuse.org/package/show/X11:XOrg/suse-prime-beta Tue Oct 1 14:16:26 UTC 2019 - Stefan Dirsch sndirsch@suse.com
* 0001-Implement-and-document-PRIME-Render-Offload-and-Dyna.patch
  0002-Fix-NVIDIA-PRIME-Render-Offload.patch

  * adds support for NVIDIA PRIME Render Offload of 435.xx/G05 driver

* add new config files (modprobe.d/dracut.d/udev.d) to package;
  regenerate initrd during installation and also during update of
  nvidia G05 KMP
I hope I can test this tomorrow on my Dell Optimus notebook ...
Works for me so far. In old suse-prime "nvidia" mode, but also in "intel" mode using PRIME Render Offload. I can't say whether DynamicPowermanagement does anything. At least it doesn't hurt having it enabled. This is on Dell Precision 5510 with

Intel Corporation HD Graphics 530 (rev 06) [8086:191b] (Skylake still) NVIDIA Corporation GM107GLM [Quadro M1000M] (rev a2) [10de:13b1]

You can check by running nvidia-smi, and also check power draw.

On

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   26C    P8     3W /  N/A |    115MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2000      G   /usr/bin/X                                    32MiB |
|    0      2468      G   /usr/bin/X                                    80MiB |
+-----------------------------------------------------------------------------+

Off

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 435.21       Driver Version: 435.21       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:01:00.0 Off |                  N/A |
| N/A   27C    P3    N/A /  N/A |    115MiB /  5934MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0      2000      G   /usr/bin/X                                    32MiB |
|    0      2468      G   /usr/bin/X                                    80MiB |
+-----------------------------------------------------------------------------+

flukejones commented 4 years ago

I know all bumblebee performance limitation. So bumblebee can unload nvidia module because execute it in a parallel x session? I'm searching a way to unload nvidia modules without restarting x when intel gpu is rendering

My advice is to use bbswitch + bumblebee then. Bumblebee is packaged for openSUSE. It can unload the drivers because it uses primus or virtualgl to run a virtual xserver, it then copies the buffer over I think.

simopil commented 4 years ago

Bumblebee in my system seems have same performance of nvidia offloading, ~400 fps (glxspheres) instead 1300+ fps in direct rendering with suse-prime. Nvidia prime offloading has tearing too in game without vsync (no tearing in direct rendering)

flukejones commented 4 years ago

Bumblebee in my system seems have same performance of nvidia offloading, ~400 fps (glxspheres) instead 1300+ fps in direct rendering with suse-prime. Nvidia prime offloading has tearing too in game without vsync (no tearing in direct rendering)

Direct rendering was likely capped to vsync. You need vsync to reduce/eliminate tearing, and that is a discussion for elsewhere. Please also note that glxgears is absolutely not representative of actual gaming performance and was probably limited by your CPU. There are already numerous discussion on Bumblebee performance, such as this, this, and this.

simopil commented 4 years ago

I did a benchmark with glmark2, I assumed that nvidia offloading has poor performance compared to direct rendering. Here the results: Direct rendering = 2688 Nvidia with offloading = 1001

Rush commented 4 years ago

I have the same results I think something is wrong with glmark2. Steam and Witcher 3 is running just fine with PRIME offloading (and `prime-select). YEAH! It's probably a little bit slower than when changing to Nvidia with prime-select but I haven't done concrete benchmarks yet.

sndirsch commented 4 years ago

@Rush How can a performance test be wrong? ;-) Anyway with issue#34 we have a dedicated performance ticket open. You might want to join this one for more discussion.

Rush commented 4 years ago

@sndirsch synthetic benchmarks can be wrong on many accounts. glxgears is famous for being the wrong benchmark

openSUSE / SUSEPrime

Implement and document PRIME Render Offload and DynamicPowerManagement #30

On

Off