Provide access to hardware acceleration in a docker container

porst17 commented 9 years ago

This is related to #5. Without having hardware acceleration in the container, we won't be able to run some of our target applications on the platform. For 3D acceleration, it seems that the container needs access to the dri module of the respective OpenGL implementation (VirtualBox, Intel, NVidia, AMD, ...) and it also appears that this library has to be identical to the host's.

But maybe there are other options I am not aware of. http://www.flockport.com/run-gui-apps-in-lxc-containers/ claims that an approach based on lxc is working, which doesn't require to put these libraries into the containers. But it has to be tested if this is actually true and how this can be utilized for Docker or if I am overlooking something here.

Having to include the hardware specific driver in a specific version into each app container doesn't make the containers very portable and this totally contradicts the Docker philosophy.

Testing for hardware acceleration can be done via

glxinfo | grep "\(\(renderer\|vendor\|version\) string\)\|direct rendering"

With Mesa software rendering this gives

direct rendering: Yes
server glx vendor string: SGI
server glx version string: 1.4
client glx vendor string: Mesa Project and SGI
client glx version string: 1.4
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.4, 128 bits)
OpenGL version string: 2.1 Mesa 10.1.0
OpenGL shading language version string: 1.30

while for the VirtualBox guest additions it gives

direct rendering: Yes
server glx vendor string: Chromium
server glx version string: 1.3 Chromium
client glx vendor string: Chromium
client glx version string: 1.3 Chromium
OpenGL vendor string: Humper
OpenGL renderer string: Chromium
OpenGL version string: 2.1 Chromium 1.9
OpenGL shading language version string: 1.20

which is what we are looking for if we are running inside a VirtualBox VM.

malex984 commented 9 years ago

I confirm, on my host linux (running inside Virtual Box VM) i get the second output, after the following warnings/errors:

libGL error: pci id for fd 4: 80ee:beef, driver (null)
OpenGL Warning: glFlushVertexArrayRangeNV not found in mesa table
OpenGL Warning: glVertexArrayRangeNV not found in mesa table
OpenGL Warning: glCombinerInputNV not found in mesa table
OpenGL Warning: glCombinerOutputNV not found in mesa table
OpenGL Warning: glCombinerParameterfNV not found in mesa table
OpenGL Warning: glCombinerParameterfvNV not found in mesa table
OpenGL Warning: glCombinerParameteriNV not found in mesa table
OpenGL Warning: glCombinerParameterivNV not found in mesa table
OpenGL Warning: glFinalCombinerInputNV not found in mesa table
OpenGL Warning: glGetCombinerInputParameterfvNV not found in mesa table
OpenGL Warning: glGetCombinerInputParameterivNV not found in mesa table
OpenGL Warning: glGetCombinerOutputParameterfvNV not found in mesa table
OpenGL Warning: glGetCombinerOutputParameterivNV not found in mesa table
OpenGL Warning: glGetFinalCombinerInputParameterfvNV not found in mesa table
OpenGL Warning: glGetFinalCombinerInputParameterivNV not found in mesa table
OpenGL Warning: glDeleteFencesNV not found in mesa table
OpenGL Warning: glFinishFenceNV not found in mesa table
OpenGL Warning: glGenFencesNV not found in mesa table
OpenGL Warning: glGetFenceivNV not found in mesa table
OpenGL Warning: glIsFenceNV not found in mesa table
OpenGL Warning: glSetFenceNV not found in mesa table
OpenGL Warning: glTestFenceNV not found in mesa table
libGL error: core dri or dri2 extension not found
libGL error: failed to load driver: vboxvideo

Do you also get them?

Besides: the user has to belong to the video group to be able to access hardware, as otherwise one would get the following:

libGL error: failed to open drm device: Permission denied
libGL error: failed to load driver: vboxvideo

BUT: GpuTest stopped working after adding user to video group: one can see a window with artifacts being created and immediately destroyed (it is the same when GpuTest we being run with root permissions)... strange :-(

porst17 commented 9 years ago

Yes, I also see the warnings. Most of it can be ignored if everything else works fine (see https://www.virtualbox.org/ticket/12746#comment:16).

Maybe GPUTest won't work with the VirtualBox driver, but I am just guessing. I also tried glmark2 but still no luck. Maybe VirtualBox 3D support is even more basic than I expected.

malex984 commented 9 years ago

Maybe GPUTest won't work with the VirtualBox driver, but I am just guessing. I also tried glmark2 but still no luck. Maybe VirtualBox 3D support is even more basic than I expected.

but GpuTests were working untill i added my user to the video group... i will look into GpuTest for verbose or debug output...

porst17 commented 9 years ago

I think glmark2 was also working on my box before I added my user to the video group yesterday. I removed my user from the group again, but glmark2 still doesn't work for some reason. Strange.

malex984 commented 9 years ago

Please run id for your user and relogin if necessary since removing user from video group with gpassed -d user video worked for me only after logout & login.

Both glxinfo and again GpuTest run and again output those errors:

libGL error: failed to open drm device: Permission denied
libGL error: failed to load driver: vboxvideo

Besides simpler GpuTest actually do show some graphics: e.g. triangle and plot3d. Windowed mode is fine only slow. FullScreen has artifacts and is very slow on my old laptop and very limited VM resources.

malex984 commented 9 years ago

Besides glmark2 gave me the following (not in video group):

libGL error: failed to open drm device: Permission denied
libGL error: failed to load driver: vboxvideo
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
=======================================================
    glmark2 2012.08
=======================================================
    OpenGL Information
    GL_VENDOR:     VMware, Inc.
    GL_RENDERER:   Gallium 0.4 on llvmpipe (LLVM 3.4, 128 bits)
    GL_VERSION:    2.1 Mesa 10.1.3
=======================================================
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
[build] use-vbo=false: FPS: 29 FrameTime: 34.483 ms
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
** Failed to set swap interval. Results may be bounded above by refresh rate.
[build] use-vbo=true: FPS: 30 FrameTime: 33.333 ms
=======================================================
                                  glmark2 Score: 29 
=======================================================

malex984 commented 9 years ago

but in the same time glxinfo indicates Chromium:

libGL error: core dri or dri2 extension not found
libGL error: failed to load driver: vboxvideo
direct rendering: Yes
server glx vendor string: Chromium
server glx version string: 1.3 Chromium
client glx vendor string: Chromium
client glx version string: 1.3 Chromium
OpenGL vendor string: Humper
OpenGL renderer string: Chromium
OpenGL version string: 2.1 Chromium 1.9
OpenGL shading language version string: 1.20

Could it be that different renderers are being used for different GPU related functionality... ?

malex984 commented 9 years ago

And i confirm that after going into video glmark2 stopped working with the following message:

libGL error: pci id for fd 4: 80ee:beef, driver (null)
OpenGL Warning: glFlushVertexArrayRangeNV not found in mesa table
...
OpenGL Warning: glTestFenceNV not found in mesa table
libGL error: core dri or dri2 extension not found
libGL error: failed to load driver: vboxvideo
OpenGL Warning: glXChooseVisual: ignoring attribute 0x22
OpenGL Warning: glXChooseFBConfig returning NULL, due to attrib=0x2, next=0x1
Error: glXChooseFBConfig() failed
Error: Error: Could not get a valid XVisualInfo!
Error: Error: Couldn't create X Window!
Error: main: Could not initialize canvas

ps: same effect if running as root...

malex984 commented 9 years ago

Just in case: after detaching video camera from VM i only have the following video-owned devides:

crw-rw----+ 1 root video 226, 0 Apr 17 10:11 /dev/dri/card0
crw-rw----  1 root video  29, 0 Apr 17 10:11 /dev/fb0

malex984 commented 9 years ago

i am going to experiment on a real linux machine with GPU now...

porst17 commented 9 years ago

Sounds like a good idea.

BTW: I tried to log out and in but now I can't log into any graphical session again. Something seems to be broken for now. Will test on monday.

malex984 commented 9 years ago

Setup: Archlinux & X11 & nvidia GPU: all GPU-related apps can run. glxinfo:

direct rendering: Yes
server glx vendor string: NVIDIA Corporation
server glx version string: 1.4
client glx vendor string: NVIDIA Corporation
client glx version string: 1.4
OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: GeForce 9300 GS/PCIe/SSE2
OpenGL core profile version string: 3.3.0 NVIDIA 340.76
OpenGL core profile shading language version string: 3.30 NVIDIA via Cg compiler
OpenGL version string: 3.3.0 NVIDIA 340.76
OpenGL shading language version string: 3.30 NVIDIA via Cg compiler
OpenGL ES profile version string: OpenGL ES 2.0 NVIDIA 340.76 340.76
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 1.00

Test: main/run.sh appchoo "rxvt-unicode -fn xft:terminus:pixelsize=12 -e bash" and run glxinfo or glmark2 from shell (maybe also set DISPLAY by hand as it will share /tmp with the host).

current Docker appchoo is now an experimental GPU testing image with (libgles1-mesa, libgles2-mesa, libegl1-mesa-drivers, libgl1-mesa-dri, mesa-vdpau-drivers, nux-tools, mesa-utils, glmark2),

Unfortunately glxinfo, glmark2 or any other OpenGL-related app. output something like the following:

libGL error: failed to load driver: swrast
** GLX does not support GLX_EXT_swap_control or GLX_MESA_swap_control!
X Error of failed request:  BadValue (integer parameter out of range for operation)
X Error of failed request:  GLXBadContext
  Major opcode of failed request:  154 (GLX)
  Minor opcode of failed request:  24 (X_GLXCreateNewContext)
  Value in failed request:  0x0
  Serial number of failed request:  ..
  Current serial number in output stream:  ..

UPDATE (20 Apr 2015)

after installing libGL from nvidia, mostly following (save for cuda)

http://stackoverflow.com/questions/25185405/using-gpu-from-a-docker-container
https://registry.hub.docker.com/u/tleyden5iwx/ubuntu-cuda/dockerfile/ i managed to get the same glxinfo and glmark2 output inside client docker container as on the host linux running X11 server (using nvidia renderer). Also glmark2 and GpuTest benchmarks seem the same!

Therefore: X11 client application in a docker container needs matching renderer (LIBGL?) but there seems to be no penalty in using GPU.

Next step: X11 Server in a separate docker container with all the GPU-sprcific stuff (incl. kernel modules...) IMHO this may be quite difficult on my linux host due to linux kernel being 3.14.38-1-lts, whereas AFAIK ubuntu is not yet that far :-( Maybe i should try to do that with vboxvideo (i.e. LIbGL from VirtualBox X11 Guest Additions in client docker container) beforehand...

malex984 commented 9 years ago

It seems that the X11 client application is required to access the same libGL library as the X11 renderer...

For further experiments i am building a docker image with Xorg & VBox Guess Additions & various testing apps.

malex984 commented 9 years ago

It may be quite helpful to set LIBGL_DEBUG=verbose before running glxinfo, glxgears, glmark2 (or any other OpenGL using app.)...

malex984 commented 9 years ago

Please notice the UPDATE to my pre-pre-previous comment about Archlinux & X11 & nvidia GPU.

porst17 commented 9 years ago

Regarding X server in a separate container: Does this mean that we need to put the libraries and drivers on the host, the X server container and each app container?

Did you try the lxc approach or the libcontainer approach (the last one being preferred since Docker might drop lxc soon).

malex984 commented 9 years ago

This does not mean anything yet! Due to X11 running on the host i was able to run the same tests and compare results in order to measure exactly the performance penalty due to running application inside a docker-container.

My docker uses native (libcontainer) execution manager. I followed http://stackoverflow.com/a/26568684, according to whom:

the correct way to do this is avoid the lxc execution context as Docker has dropped LXC as the default execution context as of docker 0.9.

Now it seems to me that Nvidia driver package actually provide 4 main parts:

linux kernel modules which actually create necessary GPU-related devices /dev/* (they will belong to video group).
LIBGL library - specialized OpenGL implementation, which makes best use of those devices.
X11 renderer module for X11 to make best use of LIBGL (from 2.).
tools/utils for controlling GPU card and set its preferences

Please correct me if i am wrong!

It seems that VirtualBox X11 Guest Additions also follow this schema - this i am going to verify next.

If this schema turns out to be common for GPU vendors - we will be forced to add specialized LIBGL to application images even if we require X11 to be run on our host with all installed HW-specific drivers...

TODO for me for later: find out whether there exists some standard specification for specialized LIBGL implementations (Mesa/Nvidia/VB X11 GA), such as filelist? permissions?

malex984 commented 9 years ago

Also http://en.wikipedia.org/wiki/Mesa_(computer_graphics) seems to support my understanding of the graphic stack and involved parts.

malex984 commented 9 years ago

Ok, i managed to get almost our setting (ArchLInux & Nvidia GPU), experimental branch poc1_nvidia_test:

host Linux with necessary kernel modules for GPU devices (from Nvidia)
:x11nv: docker service container running X11 with special LIBGL (from Nvidia)
:appchoo docker application container with that same special LIBGL running GpuTest / glmark2 using that X11 service DOES use the GPU for no visible performance penalty!

If necessary i can measure actual benchmarks here (glmark2 & GpuTest ).

According to Hans installing linux kernel modules from inside a docker container is going against the main purpose of docker and is probably not guaranteed to work for everybody in all possible cases...

@porst17 Can we assume that hardware drivers are already installed and loaded on the host linux?

porst17 commented 9 years ago

Great! Marvelous! ;-)

I think, we can assume that the drivers are already installed. But we have to find an efficient (i.e. almost maintenance free) approach to put the required libraries into the containers. I think nobody would want a container for each app and for each possible combination of hardware drivers on the host.

Any ideas?

malex984 commented 9 years ago

I think, we can assume that the drivers are already installed.

in that case we can only do anything in runtime, say during fine-tuning before the showtime, since only at this point our software can determine the following:

linux kernel version
HW drivers (e.g. kernel modules from VirtualBox/Nvidia/ATI/mesa) and their versions in order to determine matching system libraries (e.g. corresponding LIBGL) which are to be added to applications running on THIS host.
preferred settings for applications (e.g. printer/network setup, shared mounted disks, usage of the host X11 or containerized X11 with added LIBGL)

Clearly such per-host fine-tuning (pre-configuration) can only deal with supported HW/options, that is, for which we have prepared generic solutions (e.g shell script to run inside some container in order to add/change it - see below). Such alterations are too host specific! I don't think we can provide images for all possible host-configurations.

Therefore i think about the following approach:

create a single host specific container which can detect any supported HW/features/options during per-host pre-configuration/fine-tuning and thus can provide alterations for individual application images in order to auto-customize them for specific host (e.g. by generating or choosing shell scripts that are to perform alterations)
customization finishes with an equivalent of docker build: . docker create original application image . perform docker exec auto-customization on it according to fine-tuning, e.g. by running previously generated scripts or by directly executing some commands in it. . docker commit the resulting customized image for the actual use on this host.

Clearly there are many ways to implement such an approach. I find myself thinking about it as git cherry-picking patches (which e.g. update corresponding Dockerfile's) determined during per-host pre-configuration and running something like my setup.sh...

Just a rough idea. Sorry if it was too vague!

malex984 commented 9 years ago

According to my testing https://github.com/malex984/dockapp/wiki/BM it seems possible to install host-specific LIBGL libraries (including X11-client drivers) into app. images...

TODO for me: to do that for a single image/container and try share / reuse those custom things in other containers via volume sharing and some magic :)

malex984 commented 9 years ago

This seems to be done now. Or should we also experiment with GPU's from other vendors? Intel & ATI?

porst17 commented 9 years ago

I think we should at least test this with Intel. I have no AMD card available. If your CPU is Intel, then you also have an integrated Intel GPU.

malex984 commented 6 years ago

It is DONE. NVidia GPUs are currently supported using nvidia-docker-plugin (being tested all the time on all towers). Intel GPUs require OGL.tgz (with custom built of recent MESA), which is to be installed separately via puppet at Supernova (tested recently on micro PC running Info-screen application).

porst17 commented 6 years ago

I assume the nvidia-docker-plugin works with most container images (amd64 architecture), but what about the OGL.tgz? It is probably specific to a certain ubuntu version and preinstalled libraries, no? Is it documented how to rebuild OGL.tgz for other base images and how to tell wether it is actually working or not?

hilbert / hilbert-docker-images

Provide access to hardware acceleration in a docker container #9