mviereck / x11docker

Run GUI applications and desktops in docker and podman containers. Focus on security.
MIT License
5.62k stars 378 forks source link

Run with --gpu option and fail to get a glx context (remote server with NVIDIA GPU) #337

Closed johncadengo closed 3 years ago

johncadengo commented 3 years ago

I am trying to follow an example found here: https://github.com/mviereck/x11docker/issues/198

I ran this:

x11docker --gpu x11docker/check glxinfo | grep renderer

And I have nvidia drivers installed on my host. I get the following error:

docker.io/x11docker/check:latest
libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  GLXBadContext
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  6 (X_GLXIsDirect)
  Serial number of failed request:  26
  Current serial number in output stream:  25

Am I missing something? I was hoping the --gpu option would suffice for passing the GPU through to my x11docker container and I'm not seeing that is the case. So I ran the test above, and I can see the same error that I was getting in the context of another container (x11docker/lxde-wine) when trying to run an openGL application.

johncadengo commented 3 years ago

FYI, my NVIDIA Driver version:

$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module  460.67  Thu Mar 11 00:11:45 UTC 2021
GCC version:  gcc version 9.3.0 (Ubuntu 9.3.0-17ubuntu1~20.04) 
mviereck commented 3 years ago

With option --gpu and a proprietary NVIDIA driver x11docker should show you some messages how to set up the driver. Can you show me the output of x11docker?

Possibilities to use an NVIDIA GPU are described also in https://github.com/mviereck/x11docker/wiki/NVIDIA-driver-support-for-docker-container

johncadengo commented 3 years ago

So here is the full output of using the example above:

$ x11docker --gpu x11docker/check glxinfo | grep renderer
x11docker note: Your system uses closed source NVIDIA driver.
  GPU support will work only with options --hostdisplay and --xorg.
  Consider to use free open source nouveau driver instead.

x11docker note: Using X server option --hostdisplay

x11docker WARNING: Option --gpu degrades container isolation.
  Container gains access to GPU hardware.
  This allows reading host window content (palinopsia leak)
  and GPU rootkits (compare proof of concept: jellyfish).

x11docker note: Option --gpu: With X over IP the host network stack must
  be shared to allow GPU access. Enabling option --network=host.

x11docker note: Option --gpu: To allow GPU acceleration with --hostdisplay,
  x11docker will allow trusted cookies.

x11docker note: To allow protection against X security leaks 
  while using --gpu with NVIDIA, please use option --xorg.

x11docker WARNING: Option --hostdisplay with trusted cookies provides
      QUITE BAD CONTAINER ISOLATION !
  Keylogging and controlling host applications is possible! 
  Clipboard sharing is enabled (option --cliboard).
  It is recommended to use another X server option like --nxagent or --xpra.

x11docker WARNING: Option --network=host severly degrades 
  container isolation. Network namespacing is disabled. 
  Container shares host network stack. 
  Spying on network traffic may be possible. 
  Access to host X server localhost:11.0 may be possible 
  through abstract unix socket.

x11docker note: Option --gpu: You are using the closed source NVIDIA driver.
  GPU acceleration will only work if you have installed the very same driver
  version in image. That makes images less portable.
  It is recommended to use free open source nouveau driver on host instead.
  Ask NVIDIA corporation to at least publish their closed source API,
  or even better to actively support open source driver nouveau.

x11docker note: Option --gpu: x11docker can try to automatically install NVIDIA driver
  version 460.67 in container on every container startup.
  Drawbacks: Container startup is a bit slower and its security will be reduced.

  You can look here for a driver installer:
    https://www.nvidia.com/Download/index.aspx
    https://http.download.nvidia.com/
  A direct download URL is probably:
    https://http.download.nvidia.com/XFree86/Linux-x86_64/460.67/NVIDIA-Linux-x86_64-460.67.run
  If you got a driver, store it at one of the following locations:
    /home/john/.local/share/x11docker/
    /usr/local/share/x11docker/

  Be aware that the version number must match exactly the version on host.
  The file name must begin with 'NVIDIA', contain the version number 460.67
  and end with suffix '.run'.

x11docker WARNING: Sharing device file: /dev/dri

x11docker WARNING: Sharing device file: /dev/nvidia-caps

x11docker WARNING: Sharing device file: /dev/nvidia-modeset

x11docker WARNING: Sharing device file: /dev/nvidia-uvm

x11docker WARNING: Sharing device file: /dev/nvidia-uvm-tools

x11docker WARNING: Sharing device file: /dev/nvidia0

x11docker WARNING: Sharing device file: /dev/nvidiactl

x11docker WARNING: Sharing device file: /dev/vga_arbiter

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  GLXBadContext
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  6 (X_GLXIsDirect)
  Serial number of failed request:  26
  Current serial number in output stream:  25

Once I read the note, suggesting to use --hostdisplay or --xorg, I tried adding --xorg to it:

Full output:

$ x11docker x11docker --gpu --xorg x11docker/check glxinfo | grep renderer
x11docker WARNING: Option --gpu degrades container isolation.
  Container gains access to GPU hardware.
  This allows reading host window content (palinopsia leak)
  and GPU rootkits (compare proof of concept: jellyfish).

x11docker WARNING: Although x11docker starts Xorg as unprivileged user,
  most system setups wrap Xorg to give it root permissions (setuid).
  Evil containers may try to abuse this.
  Other x11docker X server options like --xephyr are more secure at this point.

x11docker WARNING: x11docker can run Xorg on another tty (option --xorg),
  but you won't see it in your SSH session.
  Rather install e.g. Xephyr on ssh server and use option --xephyr.

x11docker note: Option --gpu: You are using the closed source NVIDIA driver.
  GPU acceleration will only work if you have installed the very same driver
  version in image. That makes images less portable.
  It is recommended to use free open source nouveau driver on host instead.
  Ask NVIDIA corporation to at least publish their closed source API,
  or even better to actively support open source driver nouveau.

x11docker note: Option --gpu: x11docker can try to automatically install NVIDIA driver
  version 460.67 in container on every container startup.
  Drawbacks: Container startup is a bit slower and its security will be reduced.

  You can look here for a driver installer:
    https://www.nvidia.com/Download/index.aspx
    https://http.download.nvidia.com/
  A direct download URL is probably:
    https://http.download.nvidia.com/XFree86/Linux-x86_64/460.67/NVIDIA-Linux-x86_64-460.67.run
  If you got a driver, store it at one of the following locations:
    /home/john/.local/share/x11docker/
    /usr/local/share/x11docker/

  Be aware that the version number must match exactly the version on host.
  The file name must begin with 'NVIDIA', contain the version number 460.67
  and end with suffix '.run'.

x11docker note: Could not check for a free tty below or equal to 12.
  Would need to use command fgconsole for a better check.
  Possibilities:
  1.) Run x11docker as root.
  2.) Add user to group tty (not recommended, may be insecure).
  3.) Use display manager gdm3.
  4.) Run x11docker directly from console.

x11docker note: To access X on tty13, use command 'chvt 13'

x11docker WARNING: On debian 9, switching often between multiple X servers can
  cause a crash of one X server. This bug may be debian specific and is probably
  some sort of race condition. If you know more about this or it occurs on
  other systems, too, please report at https://github.com/mviereck/x11docker.

  You can avoid this issue with switching to a black tty before switching to X.

x11docker WARNING: Sharing device file: /dev/dri

x11docker WARNING: Sharing device file: /dev/nvidia-caps

x11docker WARNING: Sharing device file: /dev/nvidia-modeset

x11docker WARNING: Sharing device file: /dev/nvidia-uvm

x11docker WARNING: Sharing device file: /dev/nvidia-uvm-tools

x11docker WARNING: Sharing device file: /dev/nvidia0

x11docker WARNING: Sharing device file: /dev/nvidiactl

x11docker WARNING: Sharing device file: /dev/vga_arbiter

x11docker note: Option --wm: Did not find window manager image 
      x11docker/openbox 
  to provide a containerized window manager. Please run: 
      docker pull x11docker/openbox 
  If you want to use a host window manager instead and avoid this warning, 
  use option                         --wm=host  or  --wm=COMMAND 
  or provide a local image with e.g. --wm=x11docker/fvwm 
  To run without a window manager:   --wm=none  or  --desktop 
  Fallback: Will try to run a host window manager: mutter

x11docker note: Option --wm: Starting host window manager: mutter

    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
    GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_MESA_query_renderer, 
Extended renderer info (GLX_MESA_query_renderer):
OpenGL renderer string: llvmpipe (LLVM 7.0, 256 bits)

This seems to be better than using --hostdisplay in my case. However, does this mean Nvidia is not loading up?

Is there a way for me to test if Nvidia is loading up? I tried running this:

$ x11docker --gpu --xorg x11docker/check glxgears

But I wasn't able to see any output through x11 forwarding, and I wasn't able to confirm by running nvidia-smi that it was running on the nvidia instead of my CPU. Here's the text output of that command:

1874 frames in 5.0 seconds = 374.775 FPS
2152 frames in 5.0 seconds = 430.371 FPS
2069 frames in 5.0 seconds = 413.719 FPS
2086 frames in 5.0 seconds = 417.123 FPS
2173 frames in 5.0 seconds = 433.999 FPS
2165 frames in 5.0 seconds = 432.924 FPS
johncadengo commented 3 years ago

Also, I tried this, which I presume uses --hostdisplay by default:

$ x11docker --gpu x11docker/check glxgears

And here's the output:

libGL error: No matching fbConfigs or visuals found
libGL error: failed to load driver: swrast
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  31
  Current serial number in output stream:  33
mviereck commented 3 years ago

This is the note you should regard:

x11docker note: Option --gpu: x11docker can try to automatically install NVIDIA driver
  version 460.67 in container on every container startup.
  Drawbacks: Container startup is a bit slower and its security will be reduced.

  You can look here for a driver installer:
    https://www.nvidia.com/Download/index.aspx
    https://http.download.nvidia.com/
  A direct download URL is probably:
    https://http.download.nvidia.com/XFree86/Linux-x86_64/460.67/NVIDIA-Linux-x86_64-460.67.run
  If you got a driver, store it at one of the following locations:
    /home/john/.local/share/x11docker/
    /usr/local/share/x11docker/

  Be aware that the version number must match exactly the version on host.
  The file name must begin with 'NVIDIA', contain the version number 460.67
  and end with suffix '.run'.

The NVIDIA driver is installed on host, but is needed in the container, too. Currently you are running the container without NVIDIA drivers.

You can either try the suggestion from the note or look at the alternatives explained in the wiki.


Likely unrelated: I am surprised about this message in your --hostdisplay output:

x11docker note: Option --gpu: With X over IP the host network stack must
  be shared to allow GPU access. Enabling option --network=host.

Can you show me echo $DISPLAY from host? Does it contain an IP address? What gives ls /tmp/.X11-unix?

johncadengo commented 3 years ago

Ok, neither of those directories exist. So I am creating the directory $ sudo mkdir /usr/local/share/x11docker/ -p. Does it require any special file permissions, either the folder and/or the file?

ls -ag /usr/local/share/x11docker 
total 173536
drwxr-xr-x 2 root      4096 Mar 30 13:32 .
drwxr-xr-x 9 root      4096 Mar 30 13:31 ..
-rwxr-xr-x 1 root 177691692 Mar 30 13:32 NVIDIA-Linux-x86_64-460.67.run

Is it the case that if xpra is installed on the host, that it uses it by default? I think originally when I ran x11docker it would default to xephyr but now it seems to default to xpra. I have been having some issues with xpra vs xephyr, since my client is a Mac and xpra on Macs don't seem to work as well as other options (even novnc is more performant on my Mac than xpra).

mviereck commented 3 years ago

The file permissions are good, x11docker just needs read access; that is given for everyone.

Now that the driver file is provided, x11docker should automatically install it in container if you use option --gpu.

Is it the case that if xpra is installed on the host, that it uses it by default? I think originally when I ran x11docker it would default to xephyr but now it seems to default to xpra.

With proprietary NVIDIA driver and option --gpu x11docker can only run --hostdisplay or --xorg. It should not default to --xpra or --xephyr.

johncadengo commented 3 years ago

Ok, the new output is:

$ x11docker --gpu --xorg x11docker/check glxinfo | grep renderer
x11docker WARNING: Option --gpu degrades container isolation.
  Container gains access to GPU hardware.
  This allows reading host window content (palinopsia leak)
  and GPU rootkits (compare proof of concept: jellyfish).

x11docker WARNING: Although x11docker starts Xorg as unprivileged user,
  most system setups wrap Xorg to give it root permissions (setuid).
  Evil containers may try to abuse this.
  Other x11docker X server options like --xephyr are more secure at this point.

x11docker WARNING: x11docker can run Xorg on another tty (option --xorg),
  but you won't see it in your SSH session.
  Rather install e.g. Xephyr on ssh server and use option --xephyr.

x11docker note: Could not check for a free tty below or equal to 12.
  Would need to use command fgconsole for a better check.
  Possibilities:
  1.) Run x11docker as root.
  2.) Add user to group tty (not recommended, may be insecure).
  3.) Use display manager gdm3.
  4.) Run x11docker directly from console.

x11docker note: To access X on tty13, use command 'chvt 13'

x11docker WARNING: On debian 9, switching often between multiple X servers can
  cause a crash of one X server. This bug may be debian specific and is probably
  some sort of race condition. If you know more about this or it occurs on
  other systems, too, please report at https://github.com/mviereck/x11docker.

  You can avoid this issue with switching to a black tty before switching to X.

x11docker WARNING: Sharing device file: /dev/dri

x11docker WARNING: Sharing device file: /dev/nvidia-caps

x11docker WARNING: Sharing device file: /dev/nvidia-modeset

x11docker WARNING: Sharing device file: /dev/nvidia-uvm

x11docker WARNING: Sharing device file: /dev/nvidia-uvm-tools

x11docker WARNING: Sharing device file: /dev/nvidia0

x11docker WARNING: Sharing device file: /dev/nvidiactl

x11docker WARNING: Sharing device file: /dev/vga_arbiter

x11docker note: Option --wm: Did not find window manager image 
      x11docker/openbox 
  to provide a containerized window manager. Please run: 
      docker pull x11docker/openbox 
  If you want to use a host window manager instead and avoid this warning, 
  use option                         --wm=host  or  --wm=COMMAND 
  or provide a local image with e.g. --wm=x11docker/fvwm 
  To run without a window manager:   --wm=none  or  --desktop 
  Fallback: Will try to run a host window manager: mutter

x11docker note: Option --wm: Starting host window manager: mutter

x11docker note: Installing NVIDIA driver 460.67 in container.

    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
    GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_MESA_query_renderer, 
Extended renderer info (GLX_MESA_query_renderer):
OpenGL renderer string: llvmpipe (LLVM 7.0, 256 bits)

So it shows that it is installing the driver, however, still isn't using Nvidia for Open GL. I also ran this just to see, and I can see it is running on my CPU:

$ x11docker --gpu --xorg x11docker/check glxgears

...

x11docker note: Installing NVIDIA driver 460.67 in container.

1921 frames in 5.0 seconds = 383.613 FPS
2125 frames in 5.0 seconds = 424.801 FPS
2234 frames in 5.0 seconds = 446.661 FPS
2167 frames in 5.0 seconds = 433.394 FPS
2229 frames in 5.0 seconds = 445.693 FPS

Here is what you asked for:

$ echo $DISPLAY
localhost:11.0
$ ls /tmp/.X11-unix
X30

I am running an x11docker on screen 30 right now, but it was just me testing something. Do I need to close all instances of x11docker?

mviereck commented 3 years ago

x11docker note: Installing NVIDIA driver 460.67 in container.

That looks good so far.

OpenGL renderer string: llvmpipe (LLVM 7.0, 256 bits)

So it shows that it is installing the driver, however, still isn't using Nvidia for Open GL. I also ran this just to see, and I can see it is running on my CPU:

Indeed, it runs on CPU. llvmpipe also indicates software rendering. I am not sure what is going wrong. I wonder if here is an x11docker bug or if glxinfo and glxgears just do not work with NVIDIA at all.

Try: x11docker --gpu x11docker/check glxspheres64 This is similar to glxgears, but more up to date.

I am running an x11docker on screen 30 right now, but it was just me testing something. Do I need to close all instances of x11docker?

No, that is not needed.

$ echo $DISPLAY
localhost:11.0
$ ls /tmp/.X11-unix
X30

It seems display :11 is running over tcp instead over a unix socket. That is a quite unusual setup. Is that a custom setup of you?

johncadengo commented 3 years ago

As for the echo $DISPLAY, I don't believe I did anything intentional to customize that setup. Do you know how to get it to run the more conventional way, over a unix socket instead of tcp?

Here is the output of x11docker --gpu x11docker/check glxspheres64

x11docker note: Installing NVIDIA driver 460.67 in container.

Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0xab
X Error of failed request:  BadValue (integer parameter out of range for operation)
  Major opcode of failed request:  149 (GLX)
  Minor opcode of failed request:  3 (X_GLXCreateContext)
  Value in failed request:  0x0
  Serial number of failed request:  27
  Current serial number in output stream:  28

When I add the --xorg flag, I get this instead:

$ x11docker --gpu --xorg x11docker/check glxspheres64
x11docker note: Installing NVIDIA driver 460.67 in container.

Polygons in scene: 62464 (61 spheres * 1024 polys/spheres)
Visual ID of window: 0x3d1
Context is Direct
OpenGL Renderer: llvmpipe (LLVM 7.0, 256 bits)

Which looks like it's still running on CPU.

mviereck commented 3 years ago

As for the echo $DISPLAY, I don't believe I did anything intentional to customize that setup. Do you know how to get it to run the more conventional way, over a unix socket instead of tcp?

What system do you run? Which desktop? Which display manager?

When I add the --xorg flag, I get this instead:

For further test runs it makes sense to always use --xorg to exclude possible issues by the tcp setup of your regular display.

Which looks like it's still running on CPU.

Yes. Can you show me ~/.cache/x11docker/x11docker.log at www.pastebin.com after terminating x11docker --gpu --xorg x11docker/check glxspheres64? Maybe I find a hint in the log.

johncadengo commented 3 years ago

I'm running Ubuntu 20.04. Originally installed the Server Edition. I used the apt package manager to install Ubuntu-Desktop and various other packages to get a desktop running on it. Originally it had no video output, but I acquired a GPU recently so I wanted to be able to output a desktop. The display manager is the default one that comes with Ubuntu-Desktop, GDM3.

I am gathering the logs and will send you the link to them, thank you.

johncadengo commented 3 years ago

@mviereck Here you go: https://pastebin.com/NFwQ6Xzm

mviereck commented 3 years ago

Some points I found so far, will look further tomorrow:

Xorg seems to load the NVIDIA driver fom host (lines 1647-1652), but unloads it later for unknown reasons (lines 2081-2088):

(II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so
(II) Module nvidia: vendor="NVIDIA Corporation"
    compiled for 1.6.99.901, module version = 1.0.0
(II) NVIDIA dlloader X Driver  460.67  Thu Mar 11 00:09:07 UTC 2021
(II) NVIDIA Unified Driver for all Supported NVIDIA GPUs
(EE) No devices detected.
[...]
(II) Loading /usr/lib/xorg/modules/libfb.so
(II) Module fb: vendor="X.Org Foundation"
    compiled for 1.20.9, module version = 1.0.0
(II) Unloading nvidia
(II) Unloading modesetting
(II) Unloading fbdev
(II) Unloading fbdevhw
(II) Unloading vesa

Can you successfully use the NVIDIA GPU with host applications?


I found this warning,missed it above:

x11docker WARNING: x11docker can run Xorg on another tty (option --xorg),
  but you won't see it in your SSH session.
  Rather install e.g. Xephyr on ssh server and use option --xephyr.

Did you run all the commands over SSH? That also explains the tcp setup.

Using headless NVIDIA GPU over SSH is a quite special task. I wasn't aware that this is the situation we are speaking of. Somewhere is a ticket with a successful setup, will look for it tomorrow.

mviereck commented 3 years ago

Have a look at https://github.com/mviereck/x11docker/issues/199

johncadengo commented 3 years ago

@mviereck 😮

So I made some progress thanks to what you mentioned here (https://github.com/mviereck/x11docker/issues/199#issuecomment-557927571). I needed to add a virtual display to my Xorg.conf file.

SubSection     "Display"
        Virtual     1920 1080
EndSubSection

Now, from within the container, I'm able to run glxinfo | grep renderer

john@99687caaec6d:~$ glxinfo | grep render
direct rendering: Yes
OpenGL renderer string: Quadro P4000/PCIe/SSE2
    GL_ARB_conditional_render_inverted, GL_ARB_conservative_depth, 
    GL_NVX_conditional_render, GL_NVX_gpu_memory_info, GL_NVX_nvenc_interop, 
    GL_NV_command_list, GL_NV_compute_program5, GL_NV_conditional_render, 
    GL_NV_parameter_buffer_object2, GL_NV_path_rendering, 
    GL_NV_path_rendering_shared_edge, GL_NV_pixel_data_range, 
    GL_NV_stereo_view_rendering, GL_NV_texgen_reflection, 
    GL_ARB_compute_variable_group_size, GL_ARB_conditional_render_inverted, 
    GL_NVX_conditional_render, GL_NVX_gpu_memory_info, GL_NVX_nvenc_interop, 
    GL_NV_command_list, GL_NV_compute_program5, GL_NV_conditional_render, 
    GL_NV_parameter_buffer_object2, GL_NV_path_rendering, 
    GL_NV_path_rendering_shared_edge, GL_NV_pixel_data_range, 
    GL_NV_stereo_view_rendering, GL_NV_texgen_reflection, 
    GL_EXT_multisample_compatibility, GL_EXT_multisampled_render_to_texture, 
    GL_EXT_multisampled_render_to_texture2, 
    GL_EXT_raster_multisample, GL_EXT_render_snorm, GL_EXT_robustness, 
    GL_NV_clip_space_w_scaling, GL_NV_conditional_render, 
    GL_NV_packed_float_linear, GL_NV_path_rendering, 
    GL_NV_path_rendering_shared_edge, GL_NV_pixel_buffer_object, 
    GL_NV_shadow_samplers_cube, GL_NV_stereo_view_rendering, 
    GL_OES_fbo_render_mipmap, GL_OES_geometry_point_size, 
    GL_OVR_multiview_multisampled_render_to_texture

And when I run glxgears I get about 75fps, which I'm assuming is because it caps it when on the GPU. I also ran nvidia-smi from the host and I can see the program running:

Tue Mar 30 18:44:41 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:42:00.0  On |                  N/A |
| 49%   45C    P8    10W / 105W |     83MiB /  8119MiB |     17%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   2383792      G   /usr/lib/xorg/Xorg                 77MiB |
|    0   N/A  N/A   2387054      G   glxgears                            2MiB |
+-----------------------------------------------------------------------------+

If I wanted to document this for others, where should I add this to the wiki? Really appreciate your help on this, so it'd be great if I could contribute as well.

johncadengo commented 3 years ago

So I'll pick this up again tomorrow, but I wanted to say, when I run this x11docker --gpu --xorg x11docker/check glxinfo | grep renderer I still get this as a result, even after adding the virtual display:

x11docker note: Installing NVIDIA driver 460.67 in container.

    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
    GLX_EXT_visual_rating, GLX_MESA_copy_sub_buffer, GLX_MESA_query_renderer, 
Extended renderer info (GLX_MESA_query_renderer):
OpenGL renderer string: llvmpipe (LLVM 7.0, 256 bits)

So there is a difference between running the line above and running this script below that I want to nail down so I can fully understand it:

#! /bin/bash
read Xenv < <(x11docker --display=30 --desktop --size 1920x1080 --pulseaudio --gpu --xorg --xtest --showenv --sudouser x11docker/lxde-wine)
env $Xenv xpra start-desktop :30 --use-display --start-via-proxy=no --daemon=no --bandwidth-limit=0 --system-tray=no --speaker=disabled --microphone=disabled

I was using this script above in my comment where I was able to run glxinfo and glxgears on the GPU from within the container.

mviereck commented 3 years ago

And when I run glxgears I get about 75fps, which I'm assuming is because it caps it when on the GPU. I also ran nvidia-smi from the host and I can see the program running:

Great! Success so far.

glxgears runs at the monitor refresh rate, I mostly get about 60 fps. You can disable this with env vblank_mode=0 glxgears and should get the full power of the GPU.

So there is a difference between running the line above and running this script below that I want to nail down so I can fully understand it:

Surprising. Good that you found this! I can only imagine that --sudouser makes a difference although it should not. --xtest should already be enabled by default. Please try:

x11docker --gpu --xorg --sudouser x11docker/check glxinfo | grep renderer

If that works (output of Quadro P4000/PCIe/SSE2 instead of llvmpipe), please also check:

x11docker --gpu --xorg --cap-default x11docker/check glxinfo | grep renderer
x11docker --gpu --xorg --newprivileges=yes x11docker/check glxinfo | grep renderer

If I wanted to document this for others, where should I add this to the wiki? Really appreciate your help on this, so it'd be great if I could contribute as well.

That would be nice! I'd say this should be a general (all GPU types, also without GPU using xvfb) article on its own with a title like "How to set up a remote server". Within it a chapter for additional NVIDIA setup. The greatest difference compared to other GPU's would be the virtual monitor entry in xorg.conf and a link to the general NVIDIA GPU setup article. I'll try a setup with a remote headless Intel GPU for comparision. Edit: The more I look at the wiki, the less I am sure how to integrate this the best. It should be part of "Remote access" section, but would be useful in all of ssh/xpra/html5/vnc.

johncadengo commented 3 years ago

Thanks for all the help!

At the end of this, I'll take a look at the wiki and run by you the various pages I think could be updated about it. And I'll draft up an article dedicated to this special setup, with specifics on xorg.conf and NVIDIA GPUs.

This is pretty great progress, but here's the issue I'm running into. I'm actually able to get it to report the GPU with this command x11docker --gpu --xorg x11docker/check glxinfo | grep renderer and having no sudouser or special access. I realized that earlier, it wasn't working because my script was running.

#! /bin/bash
read Xenv < <(x11docker --display=30 --desktop --size 1920x1080 --pulseaudio --gpu --xorg --xtest --showenv --sudouser x11docker/lxde-wine)
env $Xenv xpra start-desktop :30 --use-display --start-via-proxy=no --daemon=no --bandwidth-limit=0 --system-tray=no --speaker=disabled --microphone=disabled

If I'm running the above script, then I can't get glxinfo in a separate container to report the GPU. Even when adding --sudouser or --cap-default or --newprivileges=yes. None of those options will report the GPU. However, if I end the script, then I get glxinfo with the vanilla command, no special sudo user or privileges needed. So, I guess my issue here is I hoped I could have multiple containers running with access to the GPU and OpenGL drivers, but right now I am limited to having at most 1.

Still, this is pretty great progress. Any ideas on how I can get multiple containers with GPU access going?

johncadengo commented 3 years ago

Oh, and I tried running glxgears with vblank_mode=0 as an env variable and it did not work. I found an answer here that states with the closed-source driver, you have to run this instead:

$ __GL_SYNC_TO_VBLANK=0 glxgears
62543 frames in 5.0 seconds = 12508.548 FPS
63985 frames in 5.0 seconds = 12796.983 FPS

And I can see the FPS go to the full capacity of the GPU.

johncadengo commented 3 years ago

Also, I'm not sure how this happened, but the size is not being set when I'm running the script again. I thought it might be due to the changed xorg.conf settings, so I went and commented out the lines I added for the virtual display, and I still get this:

x11docker note: Will try to set native resolution 1920x1080. 
  If that looks ugly, use --scale=1 to enforce a fake scaled resolution.

x11docker note: Resolution 1920x1080 not found in xrandr.

x11docker note: Panning 1920x1080. If virtual screen is greater than  
  maximal screen size, you can move virtual screen with mouse at screen edges. 
  You can force the virtual screen to match your monitor with option --scale=1

x11docker note: Panning failed, trying to scale instead.

x11docker note: Setting desired resolution 1920x1080 failed. 
  Fallback: Will use detected (normalx(normal instead.
mviereck commented 3 years ago

None of those options will report the GPU. However, if I end the script, then I get glxinfo with the vanilla command, no special sudo user or privileges needed. So, I guess my issue here is I hoped I could have multiple containers running with access to the GPU and OpenGL drivers, but right now I am limited to having at most 1.

Oh, now I see the reason: Only one Xorg at a time can use the GPU. Additional Xorg sessions can only use software rendering or have to wait for the GPU to be released.

If I run multiple Xorg wih glxgears on a headfull system, i.e. with a monitor, and switch between the ttys, only the currently visible Xorg gets GPU access. On the other Xorgs glxgears is frozen until I switch back to its tty.

So with this core setup only one container can use the GPU.

Still, this is pretty great progress. Any ideas on how I can get multiple containers with GPU access going?

One possibility: Run only one Xorg with --xorg --gpu --xonly --showenv and without a container. On this xorg session you can run multiple containers with --hostdisplay. Drawback: The containers can see and access each other over the X11 protocol, i.e. several X security leaks can be abused.

With MESA drivers we would have more secure possible setups. The closed source NVIDIA driver restricts x11docker to --hostdisplay and --xorg. With nouveau driver we could set up isolated accelerated X servers.

Another possibility I am not familiar with: docker provides an option --gpus that works for NVIDIA only and allows some special setup to split GPU resources. That might allow multiple accelerated Xorg. Though, I am not sure if that works and how to set this up correctly.


Also, I'm not sure how this happened, but the size is not being set when I'm running the script again. I thought it might be due to the changed xorg.conf settings, so I went and commented out the lines I added for the virtual display, and I still get this:

Not sure what happened here. Maybe you ran multiple Xorg and only one can use the one virtual display? Maybe you need to specify more then one virtual display for more than one Xorg (if NVIDIA allows that at all).

Not sure about the correct syntax. Maybe:

SubSection     "Display"
        Virtual-1     1920 1080
        Virtual-2     1920 1080
EndSubSection

or:

SubSection     "Display"
        Virtual-1     1920 1080
EndSubSection
SubSection     "Display"
        Virtual-2     1920 1080
EndSubSection

Just found this for two virtual displays on an intel GPU: https://askubuntu.com/a/1062889


This output is odd:

  Fallback: Will use detected (normalx(normal instead.

Can you show me the output of xrandr in container?

mviereck commented 3 years ago

I ran some tests with a remote headless server with an intel GPU. GPU acceleration works well with the fallback framebuffer setup. It has no vsync to 60 fps, i.e. the GPU is running hot with glxgears.

Additional experiments with intel virtual heads and also with evdi-dkms did not gather any advantage.

I wonder if NVIDIA really needs the Virtual entry in xorg.conf or if it can be omitted. If it would work with the framebuffer setup, less user setup would be needed.


Could you please update (--update-master) and try an unusual setup?

read Xenv < <(x11docker --display=30 --desktop --size 1920x1080 --gpu --xoverip --iglx --xorg --showenv --sudouser x11docker/lxde-wine glxgears)
env $Xenv xpra start-desktop :30 --use-display --start-via-proxy=no --daemon=no --bandwidth-limit=0 --system-tray=no --speaker=disabled --microphone=disabled

I am interested if you get a visible (after xpra attach) and accelerated glxgears. iGLX (option --iglx) is an indirect rendering feature that has been broken in Xorg a long time. It allows GPU usage over tcp (option --xoverip). Previously this setup crashed Xorg. Meanwhile it does not crash here, and the terminal output shows working acceleration, but the window is black. I've read that this would work entirely with the closed NVIDIA driver. The free MESA drivers still need a fix in their GLX implementation.

johncadengo commented 3 years ago

I tried the iglx option and I can confirm that it works. In fact, it rendered more smoothly on the client's rendering in xpra, which was unexpected. I thought it might've been a fluke, so I undid the option and tried it without it to confirm it would render less smoothly without iglx, and that was the case. What does it do? Is there any reason why the client side rendering would be different with that option?

Also, I experimented with another container image and I was able to get multiple containers spun up, each with their own xorg servers running on the GPU. I don't know if any of what's being done in there can apply here, but thought I'd link it to you: https://github.com/ehfd/docker-nvidia-glx-desktop

I will try omitting the Virtual option again and see if I can get it to work. It would be great to be able to do a remote headless server with my setup (like many people, I am unfortunately locked into the closed-source NVIDIA drivers, and as much as I hate it and as big of a headache as it gives me, I don't have the option of switching out of them with my current setup). I'm wondering if I can achieve the same setup you have with the Intel GPU but with my GPU.

johncadengo commented 3 years ago

Just FYI, to show the multiple Xorgs running concurrently with the container referenced above:

nvidia-smi
Mon Apr  5 14:52:36 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.67       Driver Version: 460.67       CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P4000        Off  | 00000000:42:00.0 Off |                  N/A |
| 51%   52C    P0    30W / 105W |    362MiB /  8119MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1545030      G   /usr/lib/xorg/Xorg                127MiB |
|    0   N/A  N/A   1545222      G   /usr/lib/xorg/Xorg                127MiB |
|    0   N/A  N/A   2645579      G   /usr/lib/xorg/Xorg                103MiB |
+-----------------------------------------------------------------------------+
johncadengo commented 3 years ago

Also, the Fallback: Will use detected (normalx(normal instead. bug has gone away. I'm assuming it had to do with a misconfiguration on my part, because this disappeared after reinstalling xorg.

mviereck commented 3 years ago

I tried the iglx option and I can confirm that it works.

Great!

In fact, it rendered more smoothly on the client's rendering in xpra, which was unexpected. I thought it might've been a fluke, so I undid the option and tried it without it to confirm it would render less smoothly without iglx, and that was the case.

That is surprising. I'd expect it to be rather less performant.

What does it do? Is there any reason why the client side rendering would be different with that option?

By default applications use direct rendering, i.e. they directly communicate with the GPU. With x11docker option --iglx (in Xorg: +iglx and environment variable LIBGL_ALWAYS_INDIRECT=1) indirect rendering is enabled. The application does not talk to the GPU but sends its GLX rendering requests to the X server that forwards them to the GPU. This can be of interest in setups where X is not accessed directly over a unix socket but over TCP. (x11docker option --xoverip).

For x11docker this is of interest in some special use cases. For example, with --runtime=kata-runtime, where the container runs in QEMU with a virtual kernel, x11docker sets up a TCP connection for X access. Direct rendering does not work in this case.

In a nutshell:

Also, I experimented with another container image and I was able to get multiple containers spun up, each with their own xorg servers running on the GPU. I don't know if any of what's being done in there can apply here, but thought I'd link it to you: https://github.com/ehfd/docker-nvidia-glx-desktop

That sounds interesting. I had a look and did not understand the full setup so far. From what I see:

Just FYI, to show the multiple Xorgs running concurrently with the container referenced above:

Can you confirm that acceleration works on all of them simultaneously, not only on one of them at a time? I could imagine that this works if each Xorg uses a different part of the GPU.

johncadengo commented 3 years ago

Thanks for the info about iglx. That's quite interesting. I'll do some more testing to confirm that behavior.

I'll also confirm that acceleration works on all of the Xorgs simultaneously.

I wanted to let you know that I saw this in the news today, might be interesting to test within x11docker: Xwayland: Support hardware accelerated rendering with the proprietary NVIDIA driver

mviereck commented 3 years ago

I'll also confirm that acceleration works on all of the Xorgs simultaneously.

I might have to dig deeper how his is possible. Currently I don't find the mind for investigation. If you find out more about this, please tell me!

I wanted to let you know that I saw this in the news today, might be interesting to test within x11docker: Xwayland: Support hardware accelerated rendering with the proprietary NVIDIA driver

Those are good news! So this might works within this year once NVIDIA catches up. Thank you for telling me.

johncadengo commented 3 years ago

@mviereck FYI according to the author,

https://github.com/ehfd/docker-nvidia-glx-desktop/issues/8#issuecomment-819957414

when all containers are privileged mode AND shares the same TTY multiple Xorg instances work.
mviereck commented 3 years ago

when all containers are privileged mode AND shares the same TTY multiple Xorg instances work.

That is quite interesting, thank you for the investigation! x11docker should not need the privileged mode. The essential key is to share the same TTY. x11docker normally avoids this, because with real monitors multiple Xorg on the same TTY do not make sense and would disturb each other.

You can specify the TTY with x11docker option --vt N with N being a number. Try to run multiply x11docker instances with same e.g. --vt 20 and check if GPU acceleration works on all of them.

One possible issue comes to mind: If one Xorg terminates, it might release the TTY and thus crash the other Xorg instances.

Edit: Just checked with intel GPU, but failed. Message of second Xorg:

[107758.071] (EE) intel(0): Failed to claim DRM device.
ehfd commented 3 years ago

I experimented in dind with two NVIDIA GPUs without privileged but Xorg doesn't manage to start because of an error on opening TTY. I'm trying to rewrite my repo to Xwayland. NVIDIA merged EGLStreams to Xorg Xwayland a week ago. Do you know if TTY is required for EGLStreams on Wayland compositors? I have two repos for deploying NVIDIA accelerated desktops on Kubernetes, of which x11docker is out of scope.

mviereck commented 3 years ago

Do you know if TTY is required for EGLStreams on Wayland compositors?

I am not sure, but I assume yes.

However, it is possible to run multiple Xwayland instances on one Wayland compositor. Also it is possible to run nested Wayland compositors in Xorg or in Wayland. This would allow segregated accelerated X sessions for different containers.

ehfd commented 3 years ago

Do you know if TTY is required for EGLStreams on Wayland compositors?

I am not sure, but I assume yes.

However, it is possible to run multiple Xwayland instances on one Wayland compositor. Also it is possible to run nested Wayland compositors in Xorg or in Wayland. This would allow segregated accelerated X sessions for different containers.

This would surely mean a more plausible solution than now.

https://github.com/ehfd/docker-nvidia-egl-desktop

We have this now but this only supports OpenGL acceleration. Vulkan doesn't run at all.

https://github.com/ehfd/docker-nvidia-glx-desktop

This is improved by symlinking /dev/ptmx to /dev/tty7 and using the -novtswitch -sharevts options in Xorg.

mviereck commented 3 years ago

Your setups look interesting. Though, I cannot test them myself because I don't have NVIDIA hardware.

This would surely mean a more plausible solution than now.

With open source MESA drivers this already works well.