Closed woensug-choi closed 4 months ago
That's an improperly broad solution for getting nvidia to work. In particular you say this is on a fresh installation. Do you have your NVIDIA drivers setup and docker-nvidia setup as well? The fact that it's trying to use MESA suggest that you it's not detecting NVIDIA. And the fix for a non-NVIDIA gpu is usually to use --device /dev/dri/card0
I would suggest that you try that instead of mounting the full /dev
If that doesn't fix it you should find out the specific device that you need.
That's an improperly broad solution for getting nvidia to work. In particular you say this is on a fresh installation. Do you have your NVIDIA drivers setup and docker-nvidia setup as well? The fact that it's trying to use MESA suggest that you it's not detecting NVIDIA. And the fix for a non-NVIDIA gpu is usually to use
--device /dev/dri/card0
I would suggest that you try that instead of mounting the full/dev
If that doesn't fix it you should find out the specific device that you need.
I will try --device /dev/dri/card0 meanwhile, I did install NVIDIA driver and toolkit installed. nvidia-smi shows fine. Also glxinfo shows no particular error message. glxgears works fast.
Yes, --device /dev/dri/card0
solves problem! but without it I get the same error. Does this mean that we still need a PR to add --device /dev/dri/card0
? I thought gpus -all
whould deal with this.
Hmm. More notes,
both with and without --device /dev/dri/card0
, when running nvidia-smi
inside the container does get correct NVIDIA GPU. But opening gazebo doesn't work without --device /dev/dri/card0
tested with,
rocker --nvidia --x11 osrf/ros:noetic-desktop-full /bin/bash
nvidia-smi
prints
Wed Nov 29 20:51:02 2023
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Quadro RTX 3000 with Max... Off | 00000000:01:00.0 Off | N/A |
| N/A 53C P0 22W / 65W | 5MiB / 6144MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
+---------------------------------------------------------------------------------------+
when running gazebo,
gazebo
# libGL error: MESA-LOADER: failed to retrieve device information
It's a fresh install of Ubuntu 22.04 with recommended NVIDIA driver installation which were selected automatically during installation process.
I thought gpus -all whould deal with this.
My understanding is that the --gpus all
only catches discrete GPUs I believe. /dev/dri/card0 is the integrated intel graphics If you're using that I don't believe that you're using the NVIDIA card.
with recommended NVIDIA driver installation
This may be a generic ubuntu recommendation not necessarly what you want
I see that you're using the 535 driver. I have only ever tested up to the 470 NVIDIA driver. (See the README) There's a level of required compatibility between the internal and external drivers and going to the
Are you using the 535-open drivers? https://forums.linuxmint.com/viewtopic.php?t=401149
I see reported issues with 535 reported here too: https://github.com/NVIDIA/nvidia-docker/issues/1767
The 535
NVIDIA driver wasn't open
driver version apart from other open
versions of drivers in Additional Drivers
in ubuntu 22.04. I've tested with 470.223.02
the proprietary version of NVIDIA driver (I did reboot by the way), but also didnt' work.
I'm not sure what I can do to help you. I can't reproduce your issue. Does Gazebo run on the host machine with NVIDIA support?
https://github.com/containers/podman/issues/7801#issuecomment-722574489
I found another pattern of potential devices that might be a solution instead of /dev/dri/card0
/dev/dri/renderD128
Also I see that the p16s can also have nvidia chips so that might mount up in a different location. Because this is too generic I don't want to merge it as proposed so I'm going to close this. But if there's a more specific device that we can detect and mount if needed that would be a good reason to reopen this.
On freshly install Ubuntu 22.04 Jammy LTS. Without doing anything, I've installed rocker with,
and ran Example in README
and Got error saying
I was able to fix the problem by adding
--volume /dev:/dev
in rocker argument. which adds-v /dev:/dev
to docker argument.I believe the right position to add
-v /dev:/dev
is--x11
argument tag since it wouldn't break even if /dev doesn't exist.Related articles https://github.com/osrf/rocker/issues/257 https://github.com/osrf/rocker/issues/206 https://github.com/kinu-garage/hut_10sqft/issues/819