osrf / rocker

A tool to run docker containers with overlays and convenient options for things like GUIs etc.
Apache License 2.0
555 stars 70 forks source link

rviz2 RenderSystem errors inside the autoware-container on ubuntu 20.04 on nvidia GPU #206

Open AndriiChumak145 opened 1 year ago

AndriiChumak145 commented 1 year ago

Environment:

I have run an autoware container as described in their docs: rocker --nvidia --x11 --user --volume $HOME/autoware --volume $HOME/autoware_map -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda After I tried to use rviz2 I received the following output:

[ERROR] [1667991076.248236751] [rviz2]: Unable to create the rendering window after 100 tries
terminate called after throwing an instance of 'std::runtime_error'
  what():  Unable to create the rendering window after 100 tries
Aborted (core dumped)

After that I tried to run your example: rocker --nvidia --x11 osrf/ros:crystal-desktop rviz2 and rviz window was displayed. However, I got the following libGL errors and as the output suggests it was trying to load intel drivers instead of nvidia:

QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-root'
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: Version 4 or later of flush extension not found
libGL error: failed to load driver: i915
libGL error: failed to open drm device: No such file or directory
libGL error: failed to load driver: iris
libGL error: MESA-LOADER: failed to retrieve device information
libGL error: Version 4 or later of flush extension not found
libGL error: failed to load driver: i915
libGL error: failed to open drm device: No such file or directory
libGL error: failed to load driver: iris
[INFO] [rviz2]: Stereo is NOT SUPPORTED
[INFO] [rviz2]: OpenGl version: 3.1 (GLSL 1.4)
[INFO] [rviz2]: Stereo is NOT SUPPORTED

I also managed to run rviz2 without errors without nvidia but the performance was unsatisfactory.

tfoote commented 1 year ago

I cannot reproduce your issue. There might be something in your environment or your volumes selecting the intel driver?

Here's my attempt I skipped the volumes as I don't have those setup.

``` 🟢 SUCCESS] ❯ rocker --nvidia --x11 --user -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda Extension volume doesn't support default arguments. Please extend it. Active extensions ['nvidia', 'x11', 'user'] Step 1/12 : FROM python:3-slim-stretch as detector ---> 7691d3cb6cbc Step 2/12 : RUN mkdir -p /tmp/distrovenv ---> Using cache ---> 3d5c8b8d9105 Step 3/12 : RUN python3 -m venv /tmp/distrovenv ---> Using cache ---> cd138b5d6c5c Step 4/12 : RUN apt-get update && apt-get install -qy patchelf binutils ---> Using cache ---> 4cbc7f2267e0 Step 5/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx==0.12.3 ---> Using cache ---> 6a00c185aa67 Step 6/12 : RUN echo 'import distro; import sys; output = (distro.name(), distro.version(), distro.codename()); print(output) if distro.name() else sys.exit(1)' > /tmp/distrovenv/detect_os.py ---> Using cache ---> c6bca879a236 Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py ---> Using cache ---> 69c32080cef1 Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static && chmod go+xr /dist/detect_os_static ---> Using cache ---> 632c0e2ea327 Step 9/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda ---> 6e0960405f3b Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os ---> 11d239ef6fe2 Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ] ---> Running in 11a27539889d Removing intermediate container 11a27539889d ---> 63af9129a7c9 Step 12/12 : CMD [ "" ] ---> Running in d9d07f4a8393 Removing intermediate container d9d07f4a8393 ---> 0b8084b7ba90 Successfully built 0b8084b7ba90 Successfully tagged rocker:os_detect_ghcr.io_autowarefoundation_autoware-universe_latest-cuda running, docker run -it --rm 0b8084b7ba90 output: ('Ubuntu', '20.04', 'focal') Writing dockerfile to /tmp/tmpq2ypdyll/Dockerfile vvvvvv # Preamble from extension [nvidia] # Ubuntu 16.04 with nvidia-docker2 beta opengl support FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd # Preamble from extension [x11] # Preamble from extension [user] FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda USER root # Snippet from extension [nvidia] RUN apt-get update && apt-get install -y --no-install-recommends \ libglvnd0 \ libgl1 \ libglx0 \ libegl1 \ libgles2 \ && rm -rf /var/lib/apt/lists/* COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all} # Snippet from extension [x11] # Snippet from extension [user] # make sure sudo is installed to be able to give user sudo access in docker RUN if ! command -v sudo >/dev/null; then \ apt-get update \ && apt-get install -y sudo \ && apt-get clean; \ fi RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` && \ if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && \ existing_user_by_name=`getent passwd "tfoote" | cut -f1 -d: || true` && \ existing_user_uid=`getent passwd "tfoote" | cut -f3 -d: || true` && \ if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && \ if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && \ existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` && \ if [ -z "${existing_group_by_gid}" ]; then \ groupadd -g "1000" "tfoote"; \ fi && \ useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Tully Foote,,," -g "1000" -d "/home/tfoote" "tfoote" && \ echo "tfoote ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker # Making sure a home directory exists if we haven't mounted the user's home directory explicitly RUN mkdir -p "$(dirname "/home/tfoote")" && mkhomedir_helper tfoote # Commands below run as the developer user USER tfoote WORKDIR /home/tfoote ^^^^^^ Building docker file with arguments: {'path': '/tmp/tmpq2ypdyll', 'rm': True, 'nocache': False, 'pull': False} building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd building > ---> 333290bd2e04 building > Step 2/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda building > ---> 6e0960405f3b building > Step 3/12 : USER root building > ---> Running in e54bdce7a314 building > Removing intermediate container e54bdce7a314 building > ---> 1bc9d42f8b04 building > Step 4/12 : RUN apt-get update && apt-get install -y --no-install-recommends libglvnd0 libgl1 libglx0 libegl1 libgles2 && rm -rf /var/lib/apt/lists/* building > ---> Running in f5cb09046247 building > Get:1 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease [17.5 kB] building > Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] building > Get:3 http://packages.ros.org/ros2/ubuntu focal InRelease [4685 B] building > Get:4 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB] building > Ign:5 https://s3.amazonaws.com/autonomoustuff-repo focal InRelease building > Get:6 https://s3.amazonaws.com/autonomoustuff-repo focal Release [2922 B] building > Ign:7 https://s3.amazonaws.com/autonomoustuff-repo focal Release.gpg building > Get:8 http://packages.ros.org/ros2/ubuntu focal/main amd64 Packages [1152 kB] building > Get:9 https://s3.amazonaws.com/autonomoustuff-repo focal/main amd64 Packages [4286 B] building > Get:10 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal/main amd64 Packages [4544 B] building > Get:11 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2269 kB] building > Get:12 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] building > Get:13 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB] building > Get:14 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB] building > Get:15 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB] building > Get:16 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB] building > Get:17 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [931 kB] building > Get:18 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB] building > Get:19 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1661 kB] building > Get:20 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB] building > Get:21 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2738 kB] building > Get:22 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB] building > Get:23 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1229 kB] building > Get:24 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1778 kB] building > Get:25 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB] building > Get:26 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB] building > Fetched 25.4 MB in 19s (1368 kB/s) Reading package lists... building > Reading package lists... building > Building dependency tree... building > Reading state information... building > libegl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libegl1 set to manually installed. libgl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libgl1 set to manually installed. libgles2 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libgles2 set to manually installed. libglvnd0 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libglvnd0 set to manually installed. libglx0 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libglx0 set to manually installed. 0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded. building > Removing intermediate container f5cb09046247 building > ---> 581a73c3f7c8 building > Step 5/12 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json building > ---> faeb9301a9b5 building > Step 6/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} building > ---> Running in 57348d841e1c building > Removing intermediate container 57348d841e1c building > ---> 6fbe6e8a269d building > Step 7/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all} building > ---> Running in 15047be356a2 building > Removing intermediate container 15047be356a2 building > ---> d5f06d23835c building > Step 8/12 : RUN if ! command -v sudo >/dev/null; then apt-get update && apt-get install -y sudo && apt-get clean; fi building > ---> Running in c31e870b4aba building > Removing intermediate container c31e870b4aba building > ---> a9cff202a37a building > Step 9/12 : RUN existing_user_by_uid=`getent passwd "1000" | cut -f1 -d: || true` && if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && existing_user_by_name=`getent passwd "tfoote" | cut -f1 -d: || true` && existing_user_uid=`getent passwd "tfoote" | cut -f3 -d: || true` && if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && existing_group_by_gid=`getent group "1000" | cut -f1 -d: || true` && if [ -z "${existing_group_by_gid}" ]; then groupadd -g "1000" "tfoote"; fi && useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Tully Foote,,," -g "1000" -d "/home/tfoote" "tfoote" && echo "tfoote ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker building > ---> Running in 2fc242551567 building > Removing intermediate container 2fc242551567 building > ---> 586ff5675e26 building > Step 10/12 : RUN mkdir -p "$(dirname "/home/tfoote")" && mkhomedir_helper tfoote building > ---> Running in 7dd5dcb04439 building > Removing intermediate container 7dd5dcb04439 building > ---> eac87d1566c7 building > Step 11/12 : USER tfoote building > ---> Running in ae3fbfd78854 building > Removing intermediate container ae3fbfd78854 building > ---> c00cd9cefed5 building > Step 12/12 : WORKDIR /home/tfoote building > ---> Running in 6805c866cf8a building > Removing intermediate container 6805c866cf8a building > ---> ae8f24ddd4c4 building > Successfully built ae8f24ddd4c4 Executing command: docker run --rm -it --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.docker4vfc7zea.xauth -v /tmp/.docker4vfc7zea.xauth:/tmp/.docker4vfc7zea.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro ae8f24ddd4c4 tfoote@c44a1cb476a3:~$ rviz2 QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-tfoote' [INFO] [1668152346.771416264] [rviz2]: Stereo is NOT SUPPORTED [INFO] [1668152346.771498605] [rviz2]: OpenGl version: 3.1 (GLSL 1.4) [INFO] [1668152346.794573875] [rviz2]: Stereo is NOT SUPPORTED ```

Screenshot from 2022-11-10 23-40-54

There's the obvious question do you have an nvidia graphics card? Do you have an appropriate nvidia driver installed and enabled?

tfoote commented 1 year ago

I also verified it works w/o the --user option aka rocker --nvidia --x11 -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda

Please try to make a minimum working example of your issue (preferably with a smaller image) to isolate the issue you're encountering.

AndriiChumak145 commented 1 year ago
  1. Yes, I have nvidia GPU with installed and working drivers (running nvidia-smi inside the container):
    
    nvidia-smi
    Fri Nov 11 11:46:30 2022       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |                               |                      |               MIG M. |
    |===============================+======================+======================|
    |   0  NVIDIA GeForce ...  On   | 00000000:01:00.0  On |                  N/A |
    | N/A   48C    P0    59W /  N/A |    787MiB /  6144MiB |    100%      Default |
    |                               |                      |                  N/A |
    +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+

2. I have just run the same command without volumes and the output of the docker is almost the same (different user names and caches). But the rviz error is the same (also without `--user` option). Here is my output:

rocker --nvidia --x11 --user -- ghcr.io/autowarefoundation/autoware-universe:latest-cuda Extension volume doesn't support default arguments. Please extend it. Active extensions ['nvidia', 'x11', 'user'] Step 1/12 : FROM python:3-slim-stretch as detector ---> 7691d3cb6cbc Step 2/12 : RUN mkdir -p /tmp/distrovenv ---> Using cache ---> 24e842389995 Step 3/12 : RUN python3 -m venv /tmp/distrovenv ---> Using cache ---> b21aa3d8e3eb Step 4/12 : RUN apt-get update && apt-get install -qy patchelf binutils ---> Using cache ---> df0a59acf6f2 Step 5/12 : RUN . /tmp/distrovenv/bin/activate && pip install distro pyinstaller==4.0 staticx==0.12.3 ---> Using cache ---> 3d111c42cc3c Step 6/12 : RUN echo 'import distro; import sys; output = (distro.name(), distro.version(), distro.codename()); print(output) if distro.name() else sys.exit(1)' > /tmp/distrovenv/detect_os.py ---> Using cache ---> 3dbbfc370808 Step 7/12 : RUN . /tmp/distrovenv/bin/activate && pyinstaller --onefile /tmp/distrovenv/detect_os.py ---> Using cache ---> a23feb565b15 Step 8/12 : RUN . /tmp/distrovenv/bin/activate && staticx /dist/detect_os /dist/detect_os_static && chmod go+xr /dist/detect_os_static ---> Using cache ---> d1af46c69120 Step 9/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda ---> 6e0960405f3b Step 10/12 : COPY --from=detector /dist/detect_os_static /tmp/detect_os ---> 8b99eb5c569e Step 11/12 : ENTRYPOINT [ "/tmp/detect_os" ] ---> Running in 8559fe1ff4e4 Removing intermediate container 8559fe1ff4e4 ---> 260a6e2a31a0 Step 12/12 : CMD [ "" ] ---> Running in bb9b005a9960 Removing intermediate container bb9b005a9960 ---> ce6ba6f0cc80 Successfully built ce6ba6f0cc80 Successfully tagged rocker:os_detect_ghcr.io_autowarefoundation_autoware-universe_latest-cuda running, docker run -it --rm ce6ba6f0cc80 output: ('Ubuntu', '20.04', 'focal')

Writing dockerfile to /tmp/tmp284jbw5x/Dockerfile vvvvvv

Preamble from extension [nvidia]

Ubuntu 16.04 with nvidia-docker2 beta opengl support

FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd

Preamble from extension [x11]

Preamble from extension [user]

FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda USER root

Snippet from extension [nvidia]

RUN apt-get update && apt-get install -y --no-install-recommends \ libglvnd0 \ libgl1 \ libglx0 \ libegl1 \ libgles2 \ && rm -rf /var/lib/apt/lists/* COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json

ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all}

Snippet from extension [x11]

Snippet from extension [user]

make sure sudo is installed to be able to give user sudo access in docker

RUN if ! command -v sudo >/dev/null; then \ apt-get update \ && apt-get install -y sudo \ && apt-get clean; \ fi

RUN existing_user_by_uid=getent passwd "1000" | cut -f1 -d: || true && \ if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && \ existing_user_by_name=getent passwd "andrii" | cut -f1 -d: || true && \ existing_user_uid=getent passwd "andrii" | cut -f3 -d: || true && \ if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && \ if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && \ existing_group_by_gid=getent group "1000" | cut -f1 -d: || true && \ if [ -z "${existing_group_by_gid}" ]; then \ groupadd -g "1000" "andrii"; \ fi && \ useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" && \ echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker

Making sure a home directory exists if we haven't mounted the user's home directory explicitly

RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii

Commands below run as the developer user

USER andrii WORKDIR /home/andrii

^^^^^^ Building docker file with arguments: {'path': '/tmp/tmp284jbw5x', 'rm': True, 'nocache': False, 'pull': False} building > Step 1/12 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd building > ---> 9d806b36b807 building > Step 2/12 : FROM ghcr.io/autowarefoundation/autoware-universe:latest-cuda building > ---> 6e0960405f3b building > Step 3/12 : USER root building > ---> Running in 59c4c572c623 building > Removing intermediate container 59c4c572c623 building > ---> 24a56b7f855a building > Step 4/12 : RUN apt-get update && apt-get install -y --no-install-recommends libglvnd0 libgl1 libglx0 libegl1 libgles2 && rm -rf /var/lib/apt/lists/* building > ---> Running in 004fa747eaef building > Get:1 http://archive.ubuntu.com/ubuntu focal InRelease [265 kB] building > Get:2 http://security.ubuntu.com/ubuntu focal-security InRelease [114 kB] building > Get:3 http://archive.ubuntu.com/ubuntu focal-updates InRelease [114 kB] Get:4 http://packages.ros.org/ros2/ubuntu focal InRelease [4685 B] building > Get:5 http://archive.ubuntu.com/ubuntu focal-backports InRelease [108 kB] building > Get:6 http://archive.ubuntu.com/ubuntu focal/restricted amd64 Packages [33.4 kB] building > Get:7 http://archive.ubuntu.com/ubuntu focal/multiverse amd64 Packages [177 kB] building > Get:8 http://archive.ubuntu.com/ubuntu focal/universe amd64 Packages [11.3 MB] building > Ign:9 https://s3.amazonaws.com/autonomoustuff-repo focal InRelease building > Get:10 http://packages.ros.org/ros2/ubuntu focal/main amd64 Packages [1152 kB] building > Get:11 https://s3.amazonaws.com/autonomoustuff-repo focal Release [2922 B] building > Ign:12 https://s3.amazonaws.com/autonomoustuff-repo focal Release.gpg building > Get:13 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal InRelease [17.5 kB] building > Get:14 https://s3.amazonaws.com/autonomoustuff-repo focal/main amd64 Packages [4286 B] building > Get:15 http://security.ubuntu.com/ubuntu focal-security/universe amd64 Packages [931 kB] building > Get:16 http://ppa.launchpad.net/longsleep/golang-backports/ubuntu focal/main amd64 Packages [4544 B] building > Get:17 http://archive.ubuntu.com/ubuntu focal/main amd64 Packages [1275 kB] building > Get:18 http://archive.ubuntu.com/ubuntu focal-updates/restricted amd64 Packages [1778 kB] building > Get:19 http://archive.ubuntu.com/ubuntu focal-updates/main amd64 Packages [2738 kB] building > Get:20 http://archive.ubuntu.com/ubuntu focal-updates/multiverse amd64 Packages [30.2 kB] building > Get:21 http://archive.ubuntu.com/ubuntu focal-updates/universe amd64 Packages [1229 kB] building > Get:22 http://archive.ubuntu.com/ubuntu focal-backports/main amd64 Packages [55.2 kB] building > Get:23 http://archive.ubuntu.com/ubuntu focal-backports/universe amd64 Packages [27.5 kB] building > Get:24 http://security.ubuntu.com/ubuntu focal-security/main amd64 Packages [2269 kB] building > Get:25 http://security.ubuntu.com/ubuntu focal-security/multiverse amd64 Packages [27.5 kB] building > Get:26 http://security.ubuntu.com/ubuntu focal-security/restricted amd64 Packages [1661 kB] building > Fetched 25.4 MB in 5s (4725 kB/s) Reading package lists... building > Reading package lists... building > Building dependency tree... building > Reading state information... building > libegl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libegl1 set to manually installed. libgl1 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libgl1 set to manually installed. libgles2 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libgles2 set to manually installed. libglvnd0 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libglvnd0 set to manually installed. libglx0 is already the newest version (1.3.2-1~ubuntu0.20.04.2). libglx0 set to manually installed. 0 upgraded, 0 newly installed, 0 to remove and 13 not upgraded. building > Removing intermediate container 004fa747eaef building > ---> 36c03d12e09a building > Step 5/12 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json building > ---> ea3b901a36cc building > Step 6/12 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} building > ---> Running in 8eed33abe04b building > Removing intermediate container 8eed33abe04b building > ---> 5c9e9114f31d building > Step 7/12 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all} building > ---> Running in 2f7f36ac9cd5 building > Removing intermediate container 2f7f36ac9cd5 building > ---> 025671b61b12 building > Step 8/12 : RUN if ! command -v sudo >/dev/null; then apt-get update && apt-get install -y sudo && apt-get clean; fi building > ---> Running in ebc2c22aab8c building > Removing intermediate container ebc2c22aab8c building > ---> c792d1ff20d6 building > Step 9/12 : RUN existing_user_by_uid=getent passwd "1000" | cut -f1 -d: || true && if [ -n "${existing_user_by_uid}" ]; then userdel -r "${existing_user_by_uid}"; fi && existing_user_by_name=getent passwd "andrii" | cut -f1 -d: || true && existing_user_uid=getent passwd "andrii" | cut -f3 -d: || true && if [ -n "${existing_user_by_name}" ]; then find / -uid ${existing_user_uid} -exec chown -h 1000 {} + || true ; find / -gid ${existing_user_uid} -exec chgrp -h 1000 {} + || true ; fi && if [ -n "${existing_user_by_name}" ]; then userdel -r "${existing_user_by_name}"; fi && existing_group_by_gid=getent group "1000" | cut -f1 -d: || true && if [ -z "${existing_group_by_gid}" ]; then groupadd -g "1000" "andrii"; fi && useradd --no-log-init --no-create-home --uid "1000" -s /bin/bash -c "Andrii Chumak,,," -g "1000" -d "/home/andrii" "andrii" && echo "andrii ALL=NOPASSWD: ALL" >> /etc/sudoers.d/rocker building > ---> Running in ba737cc89f33 building > Removing intermediate container ba737cc89f33 building > ---> de0aa5dfbf80 building > Step 10/12 : RUN mkdir -p "$(dirname "/home/andrii")" && mkhomedir_helper andrii building > ---> Running in 8ff7d0f7b3c0 building > Removing intermediate container 8ff7d0f7b3c0 building > ---> ec13643e48a3 building > Step 11/12 : USER andrii building > ---> Running in 85d13314852a building > Removing intermediate container 85d13314852a building > ---> 7e14b99dde7c building > Step 12/12 : WORKDIR /home/andrii building > ---> Running in 05f492cad5af building > Removing intermediate container 05f492cad5af building > ---> 3320e5cf582f building > Successfully built 3320e5cf582f Executing command: docker run --rm -it --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.docker9d7ongx9.xauth -v /tmp/.docker9d7ongx9.xauth:/tmp/.docker9d7ongx9.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro 3320e5cf582f andrii@de22c16eba70:~$ rviz2 QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-andrii' libGL error: MESA-LOADER: failed to retrieve device information libGL error: MESA-LOADER: failed to retrieve device information [ERROR] [1668162674.121348354] [rviz2]: RenderingAPIException: OpenGL 1.5 is not supported in GLRenderSystem::initialiseContext at /tmp/binarydeb/ros-galactic-rviz-ogre-vendor-8.5.1/.obj-x86_64-linux-gnu/ogre-v1.12.1-prefix/src/ogre-v1.12.1/RenderSystems/GL/src/OgreGLRenderSystem.cpp (line 1201)

[ERROR] [1668162674.129364641] [rviz2]: Unable to create the rendering window after 100 tries terminate called after throwing an instance of 'std::runtime_error' what(): Unable to create the rendering window after 100 tries Aborted (core dumped)


So, it looks like the volumes are not the issue.

3. I have also tried a simpler image with `rocker --nvidia --x11 osrf/ros:galactic-desktop` and got the same rviz error. Interestingly, when I tried crystal-desktop instead of galactic-desktop (galactic is used for the autoware image) I got the different intel driver error described in my first comment. 
tfoote commented 1 year ago

You're running a much newer NVIDIA driver than I've ever tested with nvidia-520

Do you know of others using this same grapics driver with the nvidia/opengl images: https://hub.docker.com/r/nvidia/opengl In the past not all nvidia drivers have been cross compatible. I don't know if that's the case here.

The crystal base image is going to be a much older version of Ubuntu which also likely doesn't have the same graphics drivers. That one might be old enough that it detects nvidia incompatability and goes for the Intel driver instead.

130s commented 1 year ago

I cannot repro the issue with ghcr.io/autowarefoundation/autoware-universe:latest-cuda -- I can spawn a GUI (*1).

I do see a similar error with a privately built ROS Galactic image though, so reporting here. ``` Building docker file with arguments: {'path': '/tmp/tmp2s44km5w', 'rm': True, 'nocache': False, 'pull': False} building > Step 1/7 : FROM nvidia/opengl:1.0-glvnd-devel-ubuntu18.04 as glvnd building > ---> 9d806b36b807 building > Step 2/7 : FROM d130s:galactic-focal-fooo building > ---> ced0bb716153 building > Step 3/7 : USER root building > ---> Using cache building > ---> c59020888b1c building > Step 4/7 : RUN apt-get update && apt-get install -y --no-install-recommends libglvnd0 libgl1 libglx0 libegl1 libgles2 && rm -rf /var/lib/apt/lists/* building > ---> Using cache building > ---> 07d8e201cccd building > Step 5/7 : COPY --from=glvnd /usr/share/glvnd/egl_vendor.d/10_nvidia.json /usr/share/glvnd/egl_vendor.d/10_nvidia.json building > ---> Using cache building > ---> b5685ade76c7 building > Step 6/7 : ENV NVIDIA_VISIBLE_DEVICES ${NVIDIA_VISIBLE_DEVICES:-all} building > ---> Using cache building > ---> 06a721d4a341 building > Step 7/7 : ENV NVIDIA_DRIVER_CAPABILITIES ${NVIDIA_DRIVER_CAPABILITIES:-all} building > ---> Using cache building > ---> c3629fc8352a building > Successfully built c3629fc8352a Executing command: docker run --rm -it -v /home/noodler:/home/noodler --gpus all -e DISPLAY -e TERM -e QT_X11_NO_MITSHM=1 -e XAUTHORITY=/tmp/.dockerlw08_ofr.xauth -v /tmp/.dockerlw08_ofr.xauth:/tmp/.dockerlw08_ofr.xauth -v /tmp/.X11-unix:/tmp/.X11-unix -v /etc/localtime:/etc/localtime:ro c3629fc8352a bash root@2ce6613e000e:/# gazebo libGL error: MESA-LOADER: failed to retrieve device information ``` ``` $ apt-cache policy python3-rocker python3-rocker: Installed: 0.2.10-100 Candidate: 0.2.10-100 Version table: *** 0.2.10-100 500 500 http://packages.ros.org/ros2/ubuntu jammy/main amd64 Packages 100 /var/lib/dpkg/status $ nvidia-smi Tue Apr 25 10:11:44 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.108.03 Driver Version: 510.108.03 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA T550 Lap... Off | 00000000:03:00.0 Off | N/A | | N/A 55C P0 9W / N/A | 4MiB / 4096MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 3037 G /usr/lib/xorg/Xorg 4MiB | +-----------------------------------------------------------------------------+ ```
root@2ce6613e000e:/# gazebo
libGL error: MESA-LOADER: failed to retrieve device information

A few weeks ago I was able to spawn Gazebo from the older version of the Docker image with the same setting, on the same host (apt packages have been updated since then).

*1...Actually, with the autoware Docker image, I can even spawn a GUI e.g. rviz2, from bash commandline attached to the container, which is very nice. Has it been a normal usecase of rocker?? If so I've been missing a great feature. Same error occurs when I pass the GUI's executable command via rocker command's argument.

130s commented 1 year ago

Nevermind the libGL error I reported in https://github.com/osrf/rocker/issues/206#issuecomment-1521867989, that may be an FAQ as I found https://github.com/osrf/rocker/issues/181.

woensug-choi commented 9 months ago

https://github.com/kinu-garage/hut_10sqft/issues/819#issuecomment-1496115410 worked for me.