mviereck / x11docker

Run GUI applications and desktops in docker and podman containers. Focus on security.
MIT License
5.62k stars 378 forks source link

Check if nvidia driver install can be automated --gpu #41

Closed 01e9 closed 6 years ago

01e9 commented 6 years ago

Official LearnOpenGL examples project has just accepted using x11docker for building the examples https://github.com/JoeyDeVries/LearnOpenGL/pull/107

As you can see after starting IDE in docker you have to docker exec in temp container to install video drivers. It's not possible to hardcode video driver in Dockerfile because it will be used on different systems and distros.

It would be nice if after container start the video driver will be installed automagically on any distro. We have to know if it's nvidia/amd and the major driver version (e.g. 390).

This task looks too complicated for me that's why I am consulting you.

If there is nothing you can do just close this issue.

mviereck commented 6 years ago

This is interesting, and I'd like to find a solution. I don't have nvidia hardware and cannot test myself. I need your help for advanced solutions.

I'm glad that it worked with option --gpu and the matching nvidia driver in image. I prepared x11docker that this should work, but I did not know for sure before you reported it here.

An easy solution (at least easy for me :D) would be an option like --runasroot "command" that allows you to install nvidia driver without a separate docker exec.


A more advanced solution would be to autodetect the host nvidia version and to install it automatically in container. This should show the nvidia driver version on host, can you confirm that?

cat /proc/driver/nvidia/version | head -n1 | awk '{ print $8 }'

The difficult point is to support multiple possible systems in container, each with its own package manager and different package names. I would like to support images of debian, ubuntu, arch, centos, fedora, alpine, void, maybe more. Also the system in image may not have a matching nvidia driver version at all in its repository. And the driver installation would take some time on every start of a container.


There is a project https://github.com/NVIDIA/nvidia-docker that adresses this and somehow uses the libraries from host. It looks complicated. I stopped considering it when I saw it changes /etc/docker/daemon.json on host. It provides its own base images to build from. I am not sure if they work with GPUs other than nvidia.


A perky approach could be to share all nvidia library files from host with container. Maybe LD_PRELOAD must be set. If we know for sure which files are installed by the nvidia driver package, this may just work and could be easy. Though, the files may have different locations on different systems; in that case multiple checks for different host and image systems would be needed.

01e9 commented 6 years ago
$ cat /proc/driver/nvidia/version | head -n1 | awk '{ print $8 }'
390.48

Also the system in image may not have a matching nvidia driver version at all in its repository.

I have this problem at my home laptop because I had some glitches with official driver from OS repository and I had to install a custom built one, and it's not available inside docker container.

mviereck commented 6 years ago

A try to simply share nvidia files from host (quite a lot, though):

for Line in $(find /etc/nvidia /etc/alternatives /etc/modprobe.d /usr/lib /usr/share/glvnd /usr/bin | grep nvidia) ; do
  Nvidiashare="-v $Line:$Line:ro $Nvidiashare"
done

x11docker --gpu --stdout -- "$Nvidiashare" x11docker/lxde glxgears

I hope it covers all needed files; maybe some of them are not needed. If it works, glxgears should show a framerate matching your monitor.

01e9 commented 6 years ago

glxgears works.

I just found out that IDE and 3D examples work when I remove --hostdisplay

https://github.com/JoeyDeVries/LearnOpenGL/blob/cdceb36d254e5e2d4f9b4dc28e99ca16e0be7a6c/docker/ide.sh#L22

The only downsidea are awkward window size and inadequate mouse capture.

https://streamable.com/xwl5q

I will try without --hostdisplay at my home laptop, if it works it's great. We will have to fix only IDE size (make it full screen) and mouse capture.

01e9 commented 6 years ago

It works without nvidia share

x11docker --gpu --stdout x11docker/lxde glxgears
mviereck commented 6 years ago

It works without nvidia share

The difference should be the shown frame rate. With --gpu it shows a frame rate about 60Hz (same as monitor), without --gpu it uses software rendering and shows a much higher framerate. What do you get with/without --gpu and with/without nvdia share?

I just found out that IDE and 3D examples work when I remove --hostdisplay

Can you please run with --hostdisplay --verbose and store the output at pastebin?

The only downsidea are awkward window size and inadequate mouse capture.

x11docker found weston and Xwayland on your host and uses them (option --weston-xwayland). It runs in desktop mode instead of seamless mode. It could not find a host window manager to use inside the weston window. What is your desktop environment?

With a window manager (you can specify one with option --wm command) and options --weston-xwayland --fullscreen you will get a fullscreen IDE. About mouse capture there is not much I can do; weston within X is a bit special in mouse handling.

01e9 commented 6 years ago

My OS is Solus with Budgie window manager --wm budgie-wm


--fullscreen makes the weston window full screen but not the IDE inside that window.

--wm budgie-wm only makes the mouse look the same as on host but IDE still doesn't have window top bar buttons: minimize, maximize, close.


Without --gpu and without --hostdisplay the FPS is low, ~60 for glxgears and lower for examples compiled in IDE.


Without --gpu and with nvidia share glxgears fps is also low ~60:

300 frames in 5.0 seconds = 59.993 FPS

With --gpu and with/without nvidia share glxgears fps is good:

6865 frames in 5.0 seconds = 1372.935 FPS

I think nvidia shares has no effect.


The output of --hostdisplay --verbose

mviereck commented 6 years ago

Thank you!

Normally glxgears shows a framerate about 60 FPS only if GPU access is successful. A framerate like 1372.935 FPS indicates that GPU access fails and software rendering is used (CPU instead of GPU). Though, I am confused that you get 60 FPS without --gpu. Maybe weston/Xwayland does some magic here.

The output of glxinfo | grep -E 'OpenGL|render|vendor' will be more useful. Please show me

x11docker --silent --stdout --stderr --gpu  \
    --  x11docker/lxde glxinfo | grep -E 'OpenGL|render|vendor'

and the same with $Nvidiashare, supplemented with some libgl* files:

for Line in $(find /etc/nvidia /etc/alternatives /etc/modprobe.d \
        /usr/lib /usr/share/glvnd /usr/bin | grep nvidia) \
        $(find /usr/lib /etc/alternatives | grep libgl | grep -v nvidia) ; do
  Nvidiashare="-v $Line:$Line:ro $Nvidiashare"
done
x11docker --silent --stdout --stderr --gpu  \
    --  "$Nvidiashare" \
    x11docker/lxde glxinfo | grep -E 'OpenGL|render|vendor'

If GPU access fails, the output will contain llvmpipe:

OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.9, 128 bits)

On success it will show something with nvidia in it. In my case with AMD hardware I get:

OpenGL renderer string: Gallium 0.4 on AMD MULLINS (DRM 2.49.0 / 4.9.0-6-amd64, LLVM 3.9.1)

My OS is Solus with Budgie window manager --wm budgie-wm

oh, ok. Budgie is a fork of gnome and inherits some of its fatal bugs. dmesg shows some segfaults if I run budgie-wm outside of budgie environment. You can install another window manager like xfwm4 or openbox and x11docker will find and use it.

01e9 commented 6 years ago

This doesn't work on my home laptop

x11docker --gpu --stdout x11docker/lxde glxgears

https://pastebin.com/raw/rKBxfU5E

After installing weston it fails too

https://pastebin.com/raw/TSwQceCC

I don't have the xpra, Xwayland packages on Solus. But it's the same at my other PC with the results posted in the previous comment. The only difference is that at my home laptop I have installed a custom built driver

$ cat /proc/driver/nvidia/version | head -n1 | awk '{ print $8 }'
387.34

It also fails with --hostdisplay https://pastebin.com/raw/fR9SHqAT

I don't want to switch from custom built driver to the official one from repositories because on my laptop it doesn't work well.

Though IDE is working fine started by this script.


In this comment everything will be about my home laptop, tomorrow I will be at the other PC again (from previous comment).

$ x11docker --silent --stdout --stderr --gpu  \
>     --  x11docker/lxde glxinfo | grep -E 'OpenGL|render|vendor'
direct rendering: Yes
server glx vendor string: SGI
client glx vendor string: Mesa Project and SGI
    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
    GLX_MESA_multithread_makecurrent, GLX_MESA_query_renderer, 
Extended renderer info (GLX_MESA_query_renderer):
OpenGL vendor string: VMware, Inc.
OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.9, 256 bits)
OpenGL core profile version string: 3.3 (Core Profile) Mesa 13.0.6
OpenGL core profile shading language version string: 3.30
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
    GL_ARB_conditional_render_inverted, GL_ARB_conservative_depth, 
    GL_NV_conditional_render, GL_NV_depth_clamp, GL_NV_packed_depth_stencil, 
OpenGL version string: 3.0 Mesa 13.0.6
OpenGL shading language version string: 1.30
OpenGL context flags: (none)
OpenGL extensions:
    GL_ARB_conditional_render_inverted, GL_ARB_conservative_depth, 
    GL_NV_conditional_render, GL_NV_depth_clamp, GL_NV_fog_distance, 
OpenGL ES profile version string: OpenGL ES 3.0 Mesa 13.0.6
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.00
OpenGL ES profile extensions:
    GL_OES_element_index_uint, GL_OES_fbo_render_mipmap,

I get empty output from this

for Line in $(find /etc/nvidia /etc/alternatives /etc/modprobe.d \
        /usr/lib /usr/share/glvnd /usr/bin | grep nvidia) \
        $(find /usr/lib /etc/alternatives | grep libgl | grep -v nvidia) ; do
  Nvidiashare="-v $Line:$Line:ro $Nvidiashare"
done
x11docker --silent --stdout --stderr --gpu  \
    --  "$Nvidiashare" \
    x11docker/lxde glxinfo | grep -E 'OpenGL|render|vendor'

Only a window that opens and closes very fast like a blink.

Even with | grep -E 'OpenGL|render|vendor' removed I get empty output.

Here is a debug that may help:

$ echo "$Nvidiashare"
-v /usr/bin/nvidia-modprobe:/usr/bin/nvidia-modprobe:ro -v /usr/bin/nvidia-modprobe:/usr/bin/nvidia-modprobe:ro -v /usr/bin/nvidia-persistenced:/usr/bin/nvidia-persistenced:ro -v /usr/bin/nvidia-smi:/usr/bin/nvidia-smi:ro -v /usr/bin/nvidia-settings:/usr/bin/nvidia-settings:ro -v /usr/bin/nvidia-cuda-mps-server:/usr/bin/nvidia-cuda-mps-server:ro -v /usr/bin/nvidia-bug-report.sh:/usr/bin/nvidia-bug-report.sh:ro -v /usr/bin/nvidia-xconfig:/usr/bin/nvidia-xconfig:ro -v /usr/bin/nvidia-debugdump:/usr/bin/nvidia-debugdump:ro -v /usr/bin/nvidia-cuda-mps-control:/usr/bin/nvidia-cuda-mps-control:ro -v /usr/share/glvnd/egl_vendor.d/10_nvidia_wayland.json:/usr/share/glvnd/egl_vendor.d/10_nvidia_wayland.json:ro -v /usr/share/glvnd/egl_vendor.d/10_nvidia.json:/usr/share/glvnd/egl_vendor.d/10_nvidia.json:ro

You can install another window manager like xfwm4 or openbox and x11docker will find and use it.

I can install it alongside budgie and it will not affect anything? I like budgie and don't want to switch.


I will try to compile the driver as advised here and check if glxgears will run after that.

01e9 commented 6 years ago

After following the build process from that comment and installing the built driver I get the same failures described in previous comment.

mviereck commented 6 years ago

$Nvidiashare contains -v /usr/bin/nvidia-modprobe:/usr/bin/nvidia-modprobe:ro two times, that may cause the error. Removing option --silent should show an error message of x11docker in that case.

I have changed the code a bit, I hope this will run:

#! /bin/bash
for Line in $(find /etc/nvidia /etc/alternatives /etc/modprobe.d \
        /usr/lib /usr/share/glvnd /usr/bin | grep nvidia) \
        $(find /usr/lib /etc/alternatives | grep libgl | grep -v nvidia) ; do
  Nvidiashare="--volume=$Line:$Line:ro
$Nvidiashare"
done
Nvidiashare=$(echo "$Nvidiashare" | sort | uniq | grep -v '^$')
x11docker --silent --stdout --stderr --gpu  \
    --  $Nvidiashare \
    x11docker/lxde glxinfo | grep -E 'OpenGL|render|vendor'

I can install it alongside budgie and it will not affect anything? I like budgie and don't want to switch.

Yes, that is no problem.

01e9 commented 6 years ago

It fails

Here is the output of

#! /bin/bash
for Line in $(find /etc/nvidia /etc/alternatives /etc/modprobe.d \
        /usr/lib /usr/share/glvnd /usr/bin | grep nvidia) \
        $(find /usr/lib /etc/alternatives | grep libgl | grep -v nvidia) ; do
  Nvidiashare="--volume=$Line:$Line:ro
$Nvidiashare"
done
Nvidiashare=$(echo "$Nvidiashare" | sort | uniq | grep -v '^$')
x11docker --verbose --stdout --stderr --gpu  \
    --  $Nvidiashare \
    x11docker/lxde glxinfo

https://pastebin.com/raw/StXc4D9x

01e9 commented 6 years ago

I installed openbox and here is the output of

x11docker --gpu --stdout --verbose x11docker/lxde glxgears

https://pastebin.com/raw/HPBNvsCT


I did xhost + and now after runing

x11docker --gpu --stdout x11docker/lxde glxgears

I get a black window and output in console

12979 frames in 5.0 seconds = 2595.649 FPS
13474 frames in 5.0 seconds = 2694.744 FPS
13229 frames in 5.0 seconds = 2645.705 FPS
13248 frames in 5.0 seconds = 2649.430 FPS
13047 frames in 5.0 seconds = 2609.339 FPS
13343 frames in 5.0 seconds = 2668.460 FPS
13162 frames in 5.0 seconds = 2632.350 FPS
13376 frames in 5.0 seconds = 2675.055 FPS
12917 frames in 5.0 seconds = 2583.374 FPS

The black window is not responding to close button click, I stopped it with Ctrl+C in terminal.

Here is the output of

x11docker --gpu --stdout --verbose x11docker/lxde glxgears

https://pastebin.com/raw/KhaYn7zJ

01e9 commented 6 years ago

I think the driver is glitchy. Better to continue the investigation at my other computer where the official driver from repository is installed.

mviereck commented 6 years ago

glxgears won't help us much.

glxinfo runs, but still no GPU access. The logfile shows usage of llvmpipe:

OpenGL renderer string: Gallium 0.4 on llvmpipe (LLVM 3.9, 256 bits)

Maybe it misses some nvidia files from host, maybe even files without "nvidia" in its name. You could run find | grep nvidia and look for probably missing files. I already omitted some probably useless files in /var/lib/dpkg, /usr/src and /usr/share. Could you have a look if something important may miss in $Nvidiashare?

mviereck commented 6 years ago

I will have a look at https://github.com/NVIDIA/nvidia-docker again, maybe I can find a useful hint.

mviereck commented 6 years ago

I've read some parts of nvidia-docker wiki and conclude that the approach to share host files will fail or work only in special cases, but not as a general solution.

It's rather worth to investigate whether x11docker could support nvidia-docker. Images would have to be build upon nvidia docker images.

I will check if those images could support other GPUs and open source drivers as well. On a first try it seems to be possible, I've installed mesa-utils on nvidia/cuda, it seems to make no harm and glxgears runs well using my AMD GPU.

Do you want to give it a try, installing nvidia-docker on your system?

mviereck commented 6 years ago

I admit I am aggrieved about NVIDIA corporation.

Since many years NVIDIA corp. refuses to publish an API for the GPU hardware that could be used by open source projects like nouveau (see also wikipedia:nouveau).

The nouveau developers do a great work in reverse engeneering, but are thwarted by NVIDIA corp. in several ways.

Personally I recommend not to buy NVIDIA hardware at all. For comparision, AMD actively supports development of open source drivers.

If we manage to get x11docker running with nvidia-docker, it means we use closed source proprietary drivers and are pinned to use their docker images as a base. The NVIDIA docker images are not automated builds but uploaded after a build elsewhere.

Personally I only use automated builds where I can immediatly check the Dockerfile. I don't trust in closed builds.

All open source drivers harmonize well. Installing mesa-utils in docker image and sharing /dev/dri is all we need to to. That is great!

It costs a lot of extra effort to support closed source NVIDIA drivers. At the same time we are pinned down to use their closed builds of docker images. That is [censored].

My personal recommendation: If someone already has NVIDIA hardware: Try if it works well with free nouveau driver. If not, throw away the NVIDIA GPU and buy something useful with an open documentation.

01e9 commented 6 years ago

Do you want to give it a try, installing nvidia-docker on your system?

I can't find in their docs how to install it on Solus. Finding somewhere in their repositories the binaries and copying them on my system and messing manually with docker configuration - I will not do that.

Installing mesa-utils in docker image and sharing /dev/dri is all we need to to. That is great!

This is advanced for me. So you will post a solution soon?


le0nz0

mviereck commented 6 years ago

If you install the free nouveau driver on your system instead of the closed source nvidia-driver-XXX, everything should work out of the box.

Be careful with replacing it as nvidia-driver digs deep into the system. Look for some documentation or ask in a forum whether it is needed to do more than just installing nouveau from your repository.

01e9 commented 6 years ago

Thank you!

mviereck commented 6 years ago

Once again, what's the deal with "sharing /dev/dri"?

/dev/dri on host contains the device files representing the GPU hardware. With docker run option --device /dev/dri the container gets access to GPU hardware.

x11docker option --gpu does exactly this. Also it shares /dev/vga_arbiter as it may be needed for some multimonitor setups.

Additionally, x11docker --gpu shares files /dev/nvidia* if they exist to support closed source nvidia drivers in image (that must match exactly the version on host).

mviereck commented 6 years ago

For your OpenGL image from your initial post in case you stay with the proprietary driver: You can create a new docker image with closed nvidia-driver based on the desired OpenGL image:

FROM yourOpenGLimage
RUN apt-get update
RUN env DEBIAN_FRONTEND=noninteractive \
    apt-get install -y --no-install-recommends nvidia-driver-XXX

Using this new image instead of the original avoids the need to execute docker exec ... apt-get install every time. Though, this is not nice for deployable solutions.


For a deployable solution I can create an option --runasroot command that can execute an apt-get install as root in container.

With cat /proc/driver/nvidia/version | head -n1 | awk '{ print $8 }' one can check for nvidia driver version on host and automatically add --runasroot to x11docker command.

All this can be done automatically with a script on host and would allow to create a deployable image that also supports closed source nvidia drivers.

01e9 commented 6 years ago

I will take this into consideration while playing with LearnOpenGL examples until I will finish the tutorials. If I will come up with a stable solution I will pull request them.


One solution that came in my mind now is to make the ide.sh script ask user which driver version should be installed inside docker image on first script run when image doesn't exist yet, and send that version (or package name) as ARG to docker build.

In order for script to list all the available versions of driver it can

docker run --rm SAME_IMAGE_AS_IN_DOCKERFILE \
    bash -c 'apt updated && (apt list | grep VENDOR)'

VENDOR is nvidia or amd, detected on host by the script.

mviereck commented 6 years ago

VENDOR is nvidia or amd, detected on host by the script.

You don't need to do anything special if an AMD or Intel GPU is on host, or if nouveau is installed for NVIDIA. All needed libraries are declared as dependencies of mesa-utils (at least for debian images, slightly different for some other distros).

One solution that came in my mind now is to make the ide.sh script ask user which driver version should be installed inside docker image on first script run when image doesn't exist yet, and send that version (or package name) as ARG to docker build.

That is a good idea and avoids the overhead of installing the driver on each start of x11docker!

I found a download site for lots of nvidia drivers: http://www.nvidia.de/Download/index.aspx Unfortunately it does not provide a browseable repository that could be scanned by a script. But it may be useful if the user is asked to choose and download a matching driver. Also there are chances to guess the right URL. I found your driver version 390.48 at us.download.nvidia.com/XFree86/Linux-x86_64/390.48/NVIDIA-Linux-x86_64-390.48.run without using the web interface.

I tried to install an arbitrary driver in a docker container. A successfull example:

wget us.download.nvidia.com/XFree86/Linux-x86_64/100.14.19/NVIDIA-Linux-x86_64-100.14.19-pkg2.run
chmod +x NVIDIA-Linux-x86_64-100.14.19-pkg2.run
./NVIDIA-Linux-x86_64-100.14.19-pkg2.run  --accept-license --no-runlevel-check \
    --no-questions --ui=none  --no-kernel-module 

I hope that all nvidia installers understand the same options. I am in doubt with --no-kernel-module. Without this option my installation failed due to a gcc version in image that was not the same as the kernel was compiled with. I don't know if container applications need the kernel modules itself.

mviereck commented 6 years ago

I reopen this issue as it is still in progress.

Could you please do a test? Start up your OpenGL image as you did in your initial post (without nvidia-driver in it), run docker exec and execute:

apt-get install -y kmod    # installer needs modprobe
wget http://us.download.nvidia.com/XFree86/Linux-x86_64/390.48/NVIDIA-Linux-x86_64-390.48.run
chmod +x NVIDIA-Linux-x86_64-390.48.run
./NVIDIA-Linux-x86_64-390.48.run --accept-license --no-runlevel-check  --no-questions --ui=none --no-kernel-module

Does OpenGL work in your IDE with this setup?

01e9 commented 6 years ago

It works.


I will test at my home laptop later (with 387 driver).

Hm, it seems they are deleting old driver versions from their download server. I tried to change in link the version but I get error page

mviereck commented 6 years ago

A script to detect a minor version for major version 387:

#! /bin/bash
Major=387
for ((Minor=250 ; Minor>=0 ; Minor-- )) ; do
  Url=http://us.download.nvidia.com/XFree86/Linux-x86_64/$Major.$Minor/NVIDIA-Linux-x86_64-$Major.$Minor.run
  curl -Is $Url | grep "200 OK" && echo "Found version $Major.$Minor at $Url" && break || echo $Major.$Minor not found
done

It is quite slow, but it succeeds with:

Found version 387.34 at http://us.download.nvidia.com/XFree86/Linux-x86_64/387.34/NVIDIA-Linux-x86_64-387.34.run

Maybe there is faster command than curl -Is? Note: It is an upper i, not a lower L.

Without && break it also finds 387.22 and 387.12.

01e9 commented 6 years ago

The previous two versions were randomly found in google "nvidia 387". Now I checked the issue reported at Solus dev tracker and in one of comments I posted a screenshot where the installed version is 387.34. So the nvidia site works fine (they don't delete old drivers).

I will test on my home laptop and let you know.

mviereck commented 6 years ago

I've uploaded an update to master branch that supports automatical installation of NVIDIA drivers in container. The terminal output should be self-explanatory. Please try out.

01e9 commented 6 years ago

On my home laptop:

  1. I installed 387.34 following these steps and the opengl examples are finally working!
  2. I followed the instructions regarding downloading the driver installation file and placing it in a special home directory, then I started the container and I got this warning in console (after all x11docker information messages)

    [WARN  tini (43)] Tini is not running as PID 1 and isn't registered as a child subreaper.
    Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
    To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.

    After that I don't see any messages regarding automatic driver installation and opengl examples don't work.

    I placed the file in the right directory:

    o@a ~/.cache/x11docker $ ls -l
    total 80904
    -rwxrwxr-x 1 o o 82784271 Apr 27 21:54 NVIDIA-Linux-x86_64-387.34.run
    -rw-rw-r-- 1 o o    31978 Apr 27 21:56 x11docker.log
    drwxrwxr-x 3 o o     4096 Apr 25 21:04 X250-x11docker-lxde
    drwxrwxr-x 3 o o     4096 Apr 25 21:22 X254-x11docker-lxde
    drwxrwxr-x 3 o o     4096 Apr 27 21:56 X50-learnopengl
    -rw-rw-r-- 1 o o      166 Apr 27 21:56 Xenv.latest

    Here is the output of top:

    top - 22:03:53 up 46 min,  0 users,  load average: 1.10, 0.99, 0.90
    Tasks:   7 total,   1 running,   6 sleeping,   0 stopped,   0 zombie
    %Cpu(s): 12.0 us,  1.9 sy,  0.0 ni, 86.1 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
    KiB Mem : 16306040 total, 11624236 free,  2275948 used,  2405856 buff/cache
    KiB Swap:        0 total,        0 free,        0 used. 13659332 avail Mem 
    
      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                            
      117 o         20   0 8058356 617976  41852 S  55.8  3.8   1:06.34 java                                                                                                                                                               
      202 o         20   0   36484   3016   2624 R   0.3  0.0   0:00.08 top                                                                                                                                                                
        1 root      20   0   49272   3028   2636 S   0.0  0.0   0:00.01 su                                                                                                                                                                 
       37 o         20   0    4520    756    692 S   0.0  0.0   0:00.00 tini                                                                                                                                                               
       53 o         20   0    4628   1696   1592 S   0.0  0.0   0:00.00 clion.sh                                                                                                                                                           
      153 o         20   0    8956   1740   1568 S   0.0  0.0   0:00.01 fsnotifier64                                                                                                                                                       
      194 o         20   0   18508   3424   2988 S   0.0  0.0   0:00.31 bash 

    Here is --verbose output https://pastebin.com/raw/CafnS0WZ

01e9 commented 6 years ago

Also now IDE is closing unexpectedly without any console error message (maybe tini is killing it).

mviereck commented 6 years ago

These are two bugs. I fixed one that prevented installing the nvidia driver.

The other one with tini I have to investigate a bit deeper. In principle I see the issue - due to more privileges x11docker decides to make a user switch, but that prevents your custom tini in image from being PID1.

Let me think a bit, I'll find a solution. If you like to, you can check with e.g. x11docker/lxde meanwhile.

01e9 commented 6 years ago

Automatic driver installation works now, thanks.

01e9 commented 6 years ago

The IDE still works. I will use it a while and let you know if I got any unexpected close.

Seems like this issue can be finally closed?

mviereck commented 6 years ago

But you still have a tini issue? The IDE runs nonetheless?

01e9 commented 6 years ago

IDE works fine.

I still get tini warning

[WARN  tini (382)] Tini is not running as PID 1 and isn't registered as a child subreaper.
Zombie processes will not be re-parented to Tini, so zombie reaping won't work.
To fix the problem, use the -s option or set the environment variable TINI_SUBREAPER to register Tini as a child subreaper, or run Tini as PID 1.

https://pastebin.com/raw/Wac0WrMX

mviereck commented 6 years ago

Great!

As the tini failure is not fatal, I will have a bit time to find a well done solution. Otherwise, I could have done a hot-fix workaround.


I am a bit confused about this nvidia installer output:

Will install GLVND GLX client libraries.
Will install GLVND EGL client libraries.
Skipping GLX non-GLVND file: "libGL.so.387.34"
Skipping GLX non-GLVND file: "libGL.so.1"
Skipping GLX non-GLVND file: "libGL.so"
Skipping EGL non-GLVND file: "libEGL.so.387.34"
Skipping EGL non-GLVND file: "libEGL.so"
Skipping EGL non-GLVND file: "libEGL.so.1"
Skipping GLVND file: "libOpenGL.so.0"
Skipping GLVND file: "libOpenGL.so"
Skipping GLVND file: "libGLESv1_CM.so.1"
Skipping GLVND file: "libGLESv1_CM.so"
Skipping GLVND file: "libGLESv2.so.2"
Skipping GLVND file: "libGLESv2.so"
Skipping GLVND file: "libGLdispatch.so.0"
Skipping GLVND file: "libGLX.so.0"
Skipping GLVND file: "libGLX.so"
Skipping GLVND file: "libGL.so.1.0.0"
Skipping GLVND file: "libGL.so.1"
Skipping GLVND file: "libGL.so"
Skipping GLVND file: "libEGL.so.1"
Skipping GLVND file: "libEGL.so"
Skipping GLVND file: "./32/libGL.so.1.0.0"
Skipping GLVND file: "libGL.so.1"
Skipping GLVND file: "libGL.so"
Skipping GLVND file: "./32/libGLX.so.0"
Skipping GLVND file: "libGLX.so"
Skipping GLVND file: "./32/libEGL.so.1"
Skipping GLVND file: "libEGL.so"

But as long as nothing is obviously missing and everything works, I will not care about this.


You posted a logfile for a failing openbox start. In that case weston crashed with a malloc() error, unrelated to openbox. Not sure what happened there, maybe a weston bug, maybe some system or RAM hickup. Please report if you can reproduce that.


Seems like this issue can be finally closed?

I did not believe it this morning, but it seems to be a yes :-)

mviereck commented 6 years ago

The tini issue is solved now, too. Only need to set TINI_SUBREAPER if tini does not run as PID1.

This does not help in edge cases like systemd in ENTRYPOINT, but I won't care for that this evening.

mviereck commented 6 years ago

Note that I changed the location for installer NVIDIA_[...].run. x11docker now looks at ~/.local/share/x11docker (current user only) and /usr/local/share/x11docker (system wide). (Looking at ~/.cache/x11docker will be dropped in next release.)

I checked some different base images and had a lot of hassle with images based on musl libc like Alpine. Compare https://github.com/sgerrand/alpine-pkg-glibc/issues/78. Nvidia driver is based on glibc and cannot be recompiled with musl libc because of closed source policy of NVIDA corporation.

This is worth another rant at NVIDIA corporation. They restrict access for their customers to the hardware they paid for and tie them down to systems based on glibc.

mviereck commented 6 years ago

I found a browseable repository for nvidia drivers: https://http.download.nvidia.com/

Not listed on the main page: repository for openSUSE: https://http.download.nvidia.com/opensuse/

Not useful here, yet another finding: repository for Windows: https://http.download.nvidia.com/Windows/

There may be further hidden repositories not listed on the main page.

Though, I did not find any installers for musl libc based systems so far.