rnd-team-dev / plotoptix

Data visualisation and ray tracing in Python based on OptiX 7.7 framework.
https://rnd.team/plotoptix
Other
499 stars 26 forks source link

Docker: Compatible device(s) not found / OptiX not initialized. #46

Closed tonyf closed 1 year ago

tonyf commented 1 year ago

I'm trying to run plotoptix within a docker container via docker run -it -e NVIDIA_VISIBLE_DEVICES=all --gpus all render-slim bash --

However, when calling

from plotoptix import NpOptiX
NpOptiX(start_now=True, devices=[0])

I get

[Py-C# interop]
OptiX initialization failed.
Unknown OptixResult code: Compatible device(s) not found / OptiX not initialized.
PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
<NpOptiX(Thread-1, initial)>

nvidia-smi output

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 530.41.03              Driver Version: 530.41.03    CUDA Version: 12.1     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                  Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf            Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 3090 Ti      On | 00000000:01:00.0 Off |                  Off |
|  0%   45C    P8               26W / 450W|      3MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

Here's my Dockerfile:

FROM python:3.10.11-slim-bullseye

ARG DEBIAN_FRONTEND=noninteractive
ARG CUDA_VER=11-7

# Install deps
RUN apt-get update && apt-get install -y curl \
  git \
  gcc \
  ffmpeg \
  libsm6 \
  libxext6 \
  libpq-dev \
  clang \
  libglib2.0-dev

# Install python
RUN apt-get update
RUN apt-get install -y software-properties-common
RUN apt-get install -y python3.10 python-is-python3

# install CUDA and GPU driver
RUN apt-get install -y gnupg2
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/3bf863cc.pub \
    && add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/debian11/x86_64/ /" \
    && add-apt-repository contrib \
    && apt-get update
RUN apt-get install -y cuda-nvcc-${CUDA_VER} cuda-libraries-${CUDA_VER} cuda-cudart-${CUDA_VER}

ENV CUDA_PATH /usr/local/cuda-${CUDA_VER}
ENV PATH $CUDA_PATH/bin:$PATH
ENV LD_LIBRARY_PATH $CUDA_PATH/lib64:$LD_LIBRARY_PATH

# install Mono
RUN apt -y install gnupg ca-certificates
RUN apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys 3FA7E0328081BFF6A14DA29AA6A19B38D3D831EF
RUN echo "deb https://download.mono-project.com/repo/ubuntu stable-bionic main" | tee /etc/apt/sources.list.d/mono-official-stable.list
RUN apt install -y libgdiplus
RUN apt update && apt -y install mono-complete

# Install plotoptix
RUN pip install --upgrade pip
RUN pip install pycparser
RUN pip install pythonnet
RUN pip install plotoptix

# Install general dependencies
RUN pip install numpy scipy matplotlib pandas scikit-learn scikit-image seaborn torch Pillow ipython opencv-python
RUN apt-get install -y python3-tk

# Tried with and w/o the following lines
COPY NVIDIA-OptiX-SDK-7.7.0-linux64-x86_64.sh .
RUN sh NVIDIA-OptiX-SDK-7.7.0-linux64-x86_64.sh --skip-license --prefix=/opt/OptiX --include-subdir

ENV PATH /optix/OptiX/SDK:$PATH
ENV PATH /optix/OptiX:$PATH
ENV PATH /optix/OptiX/include:$PATH
ENV LD_LIBRARY_PATH /optix/OptiX:$LD_LIBRARY_PATH
ENV LD_LIBRARY_PATH /optix/OptiX/include:$LD_LIBRARY_PATH

I've also tried this with a similar image based on nvidia/cuda:11.7.0-cudnn8-devel-ubuntu20.04 with the same result.

I've tried running this on the following GPUs:

Any pointers on how to resolve?

robertsulej commented 1 year ago

I was experimenting with docker some time ago and OptiX parts of the NVIDIA driver were not injected correctly. The trick was to do it manually:

docker run \
    -v /usr/lib/x86_64-linux-gnu/libnvoptix.so.1:/usr/lib64/libnvoptix.so.1 \
    -v /usr/lib/x86_64-linux-gnu/libnvoptix.so.418.56:/usr/lib64/libnvoptix.so.418.56 \
    -v /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.418.56:/usr/lib64/libnvidia-rtcore.so.418.56 \
    --rm --runtime=nvidia -it optix-docker-test sh

The above was working with an old driver. Please, give it a try with v530, if it does not work, I'll investigate what else is missing. Your error message "Compatible device(s) not found" is the sign that it is still the same problem with the driver.

tonyf commented 1 year ago

Can these be baked into the image itself? I'm assuming no because it's specific to the driver version

tonyf commented 1 year ago

Just tried this with the same issue

docker run -it \
    -v /usr/lib/x86_64-linux-gnu/libnvoptix.so.1:/usr/lib64/libnvoptix.so.1 \
    -v /usr/lib/x86_64-linux-gnu/libnvoptix.so.530.41.03:/usr/lib64/libnvoptix.so.530.41.03 \
    -v /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.530.41.03:/usr/lib64/libnvidia-rtcore.so.530.41.03 \
    -e NVIDIA_VISIBLE_DEVICES=all \
    --gpus all render-slim bash --
OptiX initialization failed.
Unknown OptixResult code: Compatible device(s) not found / OptiX not initialized.
PathTracer destructor failed.
[ERROR] (MainThread) Initial setup failed, see errors above.
<NpOptiX(Thread-1, initial)>
robertsulej commented 1 year ago

OK, thanks for checking. I'll try to make it running and let you know.

robertsulej commented 1 year ago

It seems to work, though I had to add the path where driver .so's were mounted to LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib64

You can also try mounting into one of folders that are searched for libnvoptix.so.1. strace showed me these: /lib/x86_64-linux-gnu, /usr/lib/x86_64-linux-gnu, /lib, /usr/lib.

Btw. I tested on the "base" nvidia image, 12.0.0-base-ubuntu20.04, where CUDA toolkit is not installed. The driver libraries are enough to run OptiX (unfortunately not mounted correctly by nvidia docker...).

tonyf commented 1 year ago

That worked! Thank you!

aksh-at commented 1 year ago

Hey @robertsulej, we're running into the same issue when trying to run plotoptix. strace output shows that libnvoptix.so is being read:

openat(AT_FDCWD, "/usr/lib/x86_64-linux-gnu/libnvoptix.so.1", O_RDONLY|O_CLOEXEC) = 11
read(11, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0w\6\0\0\0\0\0"..., 832) = 832
fstat(11, {st_mode=S_IFREG|0644, st_size=189105240, ...}) = 0
mmap(NULL, 191393312, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 11, 0) = 0x7fbcec979000
mprotect(0x7fbcf7a94000, 2093056, PROT_NONE) = 0
mmap(0x7fbcf7c93000, 3399680, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 11, 0xb11a000) = 0x7fbcf7c93000
mmap(0x7fbcf7fd1000, 192032, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fbcf7fd1000
close(11)    

We are on driver version 515, I see that the project page says you need >=530. Could that be why?

robertsulej commented 1 year ago

It might be. If you cannot upgrade the driver, you can try earlier release 0.14.4 (here on PyPI) instead, it should work with the driver r515.