Unable to detect NVidia driver after building the program

AlexCrassco commented 1 year ago

Hello, I am fairly new to this whole environment building situation but the change in how the world is going has gotten me truly interested in what you're doing here. I have managed to build the get3d in Ubuntu 20.04 and have installed the NVidia Toolkit, NVidia driver and NVidia container alongside all of the required packages.

I run the docker with this command sudo nvidia-docker run --gpus all -it --rm -v /home/crassco/get3d/GET3D:/get3d -it get3d:v1 b ash

It builds and I am returned with these errors at the end.

WARNING: The NVIDIA Driver was not detected.  GPU functionality will not be available.
   Use 'nvidia-docker run' to start this container; see
   https://github.com/NVIDIA/nvidia-docker/wiki/nvidia-docker .

NOTE: MOFED driver for multi-node communication was not detected.
      Multi-node communication performance may be reduced.

NOTE: The SHMEM allocation limit is set to the default of 64MB.  This may be
   insufficient for PyTorch.  NVIDIA recommends the use of the following flags:
   nvidia-docker run --ipc=host ...

When I run nvidia-smi I get this return

| NVIDIA-SMI 525.60.12    Driver Version: 527.41       CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:07:00.0  On |                  N/A |
| 34%   31C    P8    18W / 175W |    713MiB /  8192MiB |      8%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A        24      G   /Xwayland                       N/A      |
+-----------------------------------------------------------------------------+

When I check the nvcc version I get this

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Wed_Jul_14_19:41:19_PDT_2021
Cuda compilation tools, release 11.4, V11.4.100
Build cuda_11.4.r11.4/compiler.30188945_0

So as far as I am aware, I have everything installed correctly and I am unable to debug any further. How can I get the application to read the gpu and develop further?

Thank you for your time.

SteveJunGao commented 1 year ago

Hi, I haven't seen this error before.

This might be because the docker is not configured properly, can you try to run without docker (e.g. using conda environment)?

Tom0072 commented 1 year ago

@AlexCrassco Don't use nvidia-docker ,just "docker run" is ok to work.But you may have to reinstall the official docker.

SteveJunGao commented 1 year ago

Close this issue as no update received

nv-tlabs / GET3D

Unable to detect NVidia driver after building the program #72