Open AndriiChumak145 opened 1 year ago
I cannot reproduce your issue. There might be something in your environment or your volumes selecting the intel driver?
Here's my attempt I skipped the volumes as I don't have those setup.
There's the obvious question do you have an nvidia graphics card? Do you have an appropriate nvidia driver installed and enabled?
I also verified it works w/o the --user
option aka rocker --nvidia --x11 -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
Please try to make a minimum working example of your issue (preferably with a smaller image) to isolate the issue you're encountering.
nvidia-smi
inside the container):
nvidia-smi
Fri Nov 11 11:46:30 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... On | 00000000:01:00.0 On | N/A |
| N/A 48C P0 59W / N/A | 787MiB / 6144MiB | 100% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+
2. I have just run the same command without volumes and the output of the docker is almost the same (different user names and caches). But the rviz error is the same (also without `--user` option). Here is my output:
rocker --nvidia --x11 --user -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda Extension volume doesn't support default arguments. Please extend it. Active extensions ['nvidia', 'x11', 'user'] Step 1/12 : FROM python:3-slim-stretch as detector ---> 7691d3cb6cbc Step 2/12 : RUN mkdir -p /tmp/distrovenv ---> Using cache ---> 24e842389995 Step 3/12 : RUN python3 -m venv /tmp/distrovenv ---> Using cache ---> b21aa3d8e3eb Step 4/12 : RUN apt-get update && apt-get install -qy patchelf binutils ---> Using cache ---> df0a59acf6f2 Step 5/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx==0.12.3 ---> Using cache ---> 3d111c42cc3c Step 6/12 : RUN echo 'import distro; import sys; output = (distro.name(), distro.version(), distro.codename()); print(output) if distro.name() else sys.exit(1)' > /tmp/distrovenv/detect_os.py ---> Using cache ---> 3dbbfc370808 Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py ---> Using cache ---> a23feb565b15 Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static && chmod go+xr /dist/detect_os_static ---> Using cache ---> d1af46c69120 Step 9/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda ---> 6e0960405f3b Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os ---> 8b99eb5c569e Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ] ---> Running in 8559fe1ff4e4 Removing intermediate container 8559fe1ff4e4 ---> 260a6e2a31a0 Step 12/12 : CMD [ "" ] ---> Running in bb9b005a9960 Removing intermediate container bb9b005a9960 ---> ce6ba6f0cc80 Successfully built ce6ba6f0cc80 Successfully tagged rocker:os_detect_ghcr.io_autowarefoundation_autoware-universe_latest-cuda running, docker run -it --rm ce6ba6f0cc80 output: ('Ubuntu', '20.04', 'focal')
Writing dockerfile to /tmp/tmp284jbw5x/Dockerfile vvvvvv
FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd
FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda USER root
RUN apt-get update && apt-get install -y --no-install-recommends \ libglvnd0 \ libgl1 \ libglx0 \ libegl1 \ libgles2 \ && rm -rf /var/lib/apt/lists/* COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}
RUN if ! command -v sudo >/dev/null; then \ apt-get update \ && apt-get install -y sudo \ && apt-get clean; \ fi
RUN existing_user_by_uid=getent passwd "1000" | cut -f1 -d: || true
&& \
if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && \
existing_user_by_name=getent passwd "andrii" | cut -f1 -d: || true
&& \
existing_user_uid=getent passwd "andrii" | cut -f3 -d: || true
&& \
if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && \
if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && \
existing_group_by_gid=getent group "1000" | cut -f1 -d: || true
&& \
if [ -z "${existing_group_by_gid}" ]; then \
groupadd -g "1000" "andrii"; \
fi && \
useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" && \
echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker
RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii
USER andrii WORKDIR /home/andrii
^^^^^^
Building docker file with arguments: {'path': '/tmp/tmp284jbw5x', 'rm': True, 'nocache': False, 'pull': False}
building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd
building > ---> 9d806b36b807
building > Step 2/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda
building > ---> 6e0960405f3b
building > Step 3/12 : USER root
building > ---> Running in 59c4c572c623
building > Removing intermediate container 59c4c572c623
building > ---> 24a56b7f855a
building > Step 4/12 : RUN apt-get update && apt-get install -y --no-install-recommends libglvnd0 libgl1 libglx0 libegl1 libgles2 && rm -rf /var/lib/apt/lists/*
building > ---> Running in 004fa747eaef
building > Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB]
building > Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB]
building > Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB]
Get:4 http://packages.ros.org/ros2/ubuntu focal InRelease [4685 B]
building > Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB]
building > Get:6 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB]
building > Get:7 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB]
building > Get:8 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB]
building > Ign:9 https://s3.amazonaws.com/autonomoustuff-repo focal InRelease
building > Get:10 http://packages.ros.org/ros2/ubuntu focal/main amd64 Packages [1152 kB]
building > Get:11 https://s3.amazonaws.com/autonomoustuff-repo focal Release [2922 B]
building > Ign:12 https://s3.amazonaws.com/autonomoustuff-repo focal Release.gpg
building > Get:13 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease [17.5 kB]
building > Get:14 https://s3.amazonaws.com/autonomoustuff-repo focal/main amd64 Packages [4286 B]
building > Get:15 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [931 kB]
building > Get:16 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal/main amd64 Packages [4544 B]
building > Get:17 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB]
building > Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1778 kB]
building > Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2738 kB]
building > Get:20 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB]
building > Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1229 kB]
building > Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB]
building > Get:23 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB]
building > Get:24 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2269 kB]
building > Get:25 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB]
building > Get:26 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1661 kB]
building > Fetched 25.4 MB in 5s (4725 kB/s)
Reading package lists...
building > Reading package lists...
building > Building dependency tree...
building >
Reading state information...
building > libegl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libegl1 set to manually installed.
libgl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgl1 set to manually installed.
libgles2 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libgles2 set to manually installed.
libglvnd0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglvnd0 set to manually installed.
libglx0 is already the newest version (1.3.2-1~ubuntu0.20.04.2).
libglx0 set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded.
building > Removing intermediate container 004fa747eaef
building > ---> 36c03d12e09a
building > Step 5/12 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json
building > ---> ea3b901a36cc
building > Step 6/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all}
building > ---> Running in 8eed33abe04b
building > Removing intermediate container 8eed33abe04b
building > ---> 5c9e9114f31d
building > Step 7/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}
building > ---> Running in 2f7f36ac9cd5
building > Removing intermediate container 2f7f36ac9cd5
building > ---> 025671b61b12
building > Step 8/12 : RUN if ! command -v sudo >/dev/null; then apt-get update && apt-get install -y sudo && apt-get clean; fi
building > ---> Running in ebc2c22aab8c
building > Removing intermediate container ebc2c22aab8c
building > ---> c792d1ff20d6
building > Step 9/12 : RUN existing_user_by_uid=getent passwd "1000" | cut -f1 -d: || true
&& if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && existing_user_by_name=getent passwd "andrii" | cut -f1 -d: || true
&& existing_user_uid=getent passwd "andrii" | cut -f3 -d: || true
&& if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && existing_group_by_gid=getent group "1000" | cut -f1 -d: || true
&& if [ -z "${existing_group_by_gid}" ]; then groupadd -g "1000" "andrii"; fi && useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" && echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker
building > ---> Running in ba737cc89f33
building > Removing intermediate container ba737cc89f33
building > ---> de0aa5dfbf80
building > Step 10/12 : RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii
building > ---> Running in 8ff7d0f7b3c0
building > Removing intermediate container 8ff7d0f7b3c0
building > ---> ec13643e48a3
building > Step 11/12 : USER andrii
building > ---> Running in 85d13314852a
building > Removing intermediate container 85d13314852a
building > ---> 7e14b99dde7c
building > Step 12/12 : WORKDIR /home/andrii
building > ---> Running in 05f492cad5af
building > Removing intermediate container 05f492cad5af
building > ---> 3320e5cf582f
building > Successfully built 3320e5cf582f
Executing command:
docker run --rm -it --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.docker9d7ongx9.xauth -v /tmp/.docker9d7ongx9.xauth:/tmp/.docker9d7ongx9.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro 3320e5cf582f
andrii@de22c16eba70:~$ rviz2
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-andrii'
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: MESA-LOADER: failed to retrieve device information
[ERROR] [1668162674.121348354] [rviz2]: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)
[ERROR] [1668162674.129364641] [rviz2]: Unable to create the rendering window after 100 tries terminate called after throwing an instance of 'std::runtime_error' what(): Unable to create the rendering window after 100 tries Aborted (core dumped)
So, it looks like the volumes are not the issue.
3. I have also tried a simpler image with `rocker --nvidia --x11 osrf/ros:galactic-desktop` and got the same rviz error. Interestingly, when I tried crystal-desktop instead of galactic-desktop (galactic is used for the autoware image) I got the different intel driver error described in my first comment.
You're running a much newer NVIDIA driver than I've ever tested with nvidia-520
Do you know of others using this same grapics driver with the nvidia/opengl images: https://hub.docker.com/r/nvidia/opengl In the past not all nvidia drivers have been cross compatible. I don't know if that's the case here.
The crystal base image is going to be a much older version of Ubuntu which also likely doesn't have the same graphics drivers. That one might be old enough that it detects nvidia incompatability and goes for the Intel driver instead.
I cannot repro the issue with ghcr.io/autowarefoundation/autoware-universe:latest-cuda
-- I can spawn a GUI (*1).
root@2ce6613e000e:/# gazebo
libGL error: MESA-LOADER: failed to retrieve device information
A few weeks ago I was able to spawn Gazebo
from the older version of the Docker image with the same setting, on the same host (apt
packages have been updated since then).
*1...Actually, with the autoware Docker image, I can even spawn a GUI e.g. rviz2
, from bash
commandline attached to the container, which is very nice. Has it been a normal usecase of rocker
?? If so I've been missing a great feature. Same error occurs when I pass the GUI's executable command via rocker
command's argument.
Nevermind the libGL
error I reported in https://github.com/osrf/rocker/issues/206#issuecomment-1521867989, that may be an FAQ as I found https://github.com/osrf/rocker/issues/181.
Environment:
I have run an autoware container as described in their docs:
rocker --nvidia --x11 --user --volume $HOME/autoware --volume $HOME/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda
After I tried to use rviz2 I received the following output:After that I tried to run your example:
rocker --nvidia --x11 osrf/ros:crystal-desktop rviz2
and rviz window was displayed. However, I got the following libGL errors and as the output suggests it was trying to load intel drivers instead of nvidia:I also managed to run rviz2 without errors without nvidia but the performance was unsatisfactory.