nvidia-holoscan / holohub

Central repository for applications and operators for Holoscan
Apache License 2.0
104 stars 64 forks source link

Holohub container does not support GeForce RTX 4060 Laptop GPU #89

Closed divyansh2681 closed 1 year ago

divyansh2681 commented 1 year ago

Hello, I am trying to run holohub applications on my system (Ubuntu 22.04 running on an x86_64 machine with RTX 4060 GPU). When I launch the dev_container, I get the following error. image Does the container not work with RTX GPUs?

agirault commented 1 year ago

Hi @divyansh2681.

Could you please provide the result of these commands:

  1. nvidia-ctk --version
  2. docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:22.03-py3
  3. docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.08-py3

Is the result the same after you update your version of nvidia-container toolkit to the latest v1.14.1 ?

divyansh2681 commented 1 year ago

Thank you for responding @agirault

Before Updating: Screenshot from 2023-09-19 14-13-58

Screenshot from 2023-09-19 14-13-24

Sorry I missed taking a screenshot when running docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:22.03-py3 before updating.

divyansh2681 commented 1 year ago

After updating:

image

image

image

I don't get this issue when using the 3rd command (as seen in the last image), one with version 23.08. How can I use that as a default version when starting the container?

agirault commented 1 year ago

My initial assumption is that this means the 22.03-py3 container has packages relying on cuda (cuda runtime, cudnn, and/or tensorrt) with SASS builds for architectures that don't include the 4080. 4080 is Ada, so sm_89 which should be supported starting with 11.8.

Looks like nvcr.io/nvidia/tensorrt:23.04-py3 has CUDA 11.8 and is still on Ubuntu 20.04 base, so it would be the ideal base to switch to. Can you confirm that the following does not show the error?

docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.04-py3

If that looks good, you could create your own base container with holoscan like this (let's name this holoscan-0.6-trt-23.04.dockerfile):

FROM nvcr.io/nvidia/tensorrt:23.04-py3

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt-get update
RUN apt-get install holoscan

# Install more things like libtorch if you need that

Then follow the instructions from the top level README to use that as the base image for holohub:

./dev_container build  --docker_file /path/to/holoscan-0.6-trt-23.04.dockerfile
divyansh2681 commented 1 year ago

docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.04-py3 does not show any error. When building the container, RUN apt-get install holoscan results in the following error..

 => ERROR [5/5] RUN apt-get install holoscan                                                                                                                                                                                1.6s 
------                                                                                                                                                                                                                           
 > [5/5] RUN apt-get install holoscan:                                                                                                                                                                                           
0.254 Reading package lists...                                                                                                                                                                                                   
1.360 Building dependency tree...                                                                                                                                                                                                
1.568 Reading state information...                                                                                                                                                                                               
1.596 E: Unable to locate package holoscan
------
holoscan-0.6-trt-23.04.dockerfile:6
--------------------
   4 |     RUN dpkg -i cuda-keyring_1.1-1_all.deb
   5 |     RUN apt-get update
   6 | >>> RUN apt-get install holoscan
   7 |     
   8 |     # Install more things like libtorch if you need that
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get install holoscan" did not complete successfully: exit code: 100

Update: I installed holoscan using pip by adding RUN pip3 install holoscan in the dockerfile.

divyansh2681 commented 1 year ago

@agirault When using pip to install holoscan, cmake can't find the holoscan files

root@divyansh-moon:/workspace/holohub# ./run build 
Building Holohub
Building sample applications.
[command] cmake -S . -B build -DPython3_EXECUTABLE=/usr/bin/python3 -DPython3_ROOT_DIR=/usr/lib/python3 -DHOLOHUB_DATA_DIR=/workspace/holohub/data -DCMAKE_BUILD_TYPE=release -DBUILD_SAMPLE_APPS=1 
CMake Error at applications/colonoscopy_segmentation/CMakeLists.txt:39 (find_package):
  Could not find a package configuration file provided by "holoscan"
  (requested version 0.6) with any of the following names:

    holoscanConfig.cmake
    holoscan-config.cmake

  Add the installation prefix of "holoscan" to CMAKE_PREFIX_PATH or set
  "holoscan_DIR" to a directory containing one of the above files.  If
  "holoscan" provides a separate development package or SDK, be sure it has
  been installed.

Is there any other way to install holoscan in the docker container?

agirault commented 1 year ago

Could not find a package configuration file provided by "holoscan"

To build holohub's cmake project, we'll need holoscan's cmake config which is not in the python wheel. Are you only interested in python application? Any python binding or pure python?

E: Unable to locate package holoscan

That's what we should address. I shared this:

wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb

Should be this to actually choose the distro/arch in your case:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
divyansh2681 commented 1 year ago

Could not find a package configuration file provided by "holoscan"

To build holohub's cmake project, we'll need holoscan's cmake config which is not in the python wheel. Are you only interested in python application? Any python binding or pure python?

E: Unable to locate package holoscan

That's what we should address. I shared this:

wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb

Should be this to actually choose the distro/arch in your case:

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb

I am interested in python and C++ both.

I have already added RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb in my docker file. I still get the same error.

divyansh2681 commented 1 year ago

My docker file looks like this, I have ubuntu 22.04

FROM nvcr.io/nvidia/tensorrt:23.04-py3

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt-get update
RUN pip3 install holoscan
# RUN apt-get install -y holoscan
agirault commented 1 year ago

what matters is the ubuntu version inside your container, not on your host. tensorrt:23.04-py3 is based on ubuntu 20.04:

$ docker run --rm --entrypoint=bash nvcr.io/nvidian/tensorrt:23.04-py3 -c 'cat /etc/os-release | grep VERSION'                ─╯
VERSION="20.04.6 LTS (Focal Fossa)"
VERSION_ID="20.04"
VERSION_CODENAME=focal
divyansh2681 commented 1 year ago

Okay, I get it. The container got built with this. Thank you so much for your help!

divyansh2681 commented 1 year ago

I ran into another problem:

nvcr.io/nvidia/tensorrt:23.04-py3 has CUDA 12.1 accoridng to nvcc -V.

$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0

The /usr/local/ has 3 CUDA folders - 11.4, 11.8 and 12.1.

When I build the holohub applications using ./run build, I get the following erros:

/usr/bin/ld: warning: libnppidei.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppig.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppicc.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppc.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)

Upon checking for the missing libraries, I found that CUDA 11.4 has these libraries and CUDA 11.8 does not have these. Which CUDA version is actually used by the holohub applications in the container?

agirault commented 1 year ago

I'm sorry @divyansh2681 I lead you in error, looks like TRT 23.04 only has CUDA 11.8 for HPCX but the main CUDA version is 12. Could you try with TRT 22.12 base image instead? We did not built Holoscan SDK 0.6 binaries for CUDA 12 (the upcoming SDK release will target CUDA 12 & Ubuntu 22.04 though)

divyansh2681 commented 1 year ago

I'm sorry @divyansh2681 I lead you in error, looks like TRT 23.04 only has CUDA 11.8 for HPCX but the main CUDA version is 12. Could you try with TRT 22.12 base image instead? We did not built Holoscan SDK 0.6 binaries for CUDA 12 (the upcoming SDK release will target CUDA 12 & Ubuntu 22.04 though)

Oh okay, no worries. I created another container with TRT 22.12 base image. The container got built and does not show any error related to GPU support. However, when building holohub apps using ./run build, I get the following error:

/usr/bin/ld: warning: libcudart.so.12, needed by ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaUnregisterFatBinary@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFatBinaryEnd@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFatBinary@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFunction@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaPopCallConfiguration@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaMemcpyAsync@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaPushCallConfiguration@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaGetErrorString@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaLaunchKernel@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaStreamSynchronize@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaEventDestroy@libcudart.so.12'

This container has CUDA 11.8 according to $ nvcc -V, but holohub apps are looking for libcudart.so.12. Do I need to change the CUDA version to be used somewhere in the holohub application files?

agirault commented 1 year ago

So I'm not super familiar with Holohub's container and build infrastructure yet (I usually I'm focused on the SDK itself), but did you try clearing your build directory already before rerunning? ./run clear_cache. Previously executed builds could lead some cached cmake config to point to the CUDA 12 we had in the previous containers.

divyansh2681 commented 1 year ago

This worked, I am able to build the applications. Thank you so much!

divyansh2681 commented 1 year ago

Hi @agirault, I installed holoscan-sdk on my x86_64 machine running Ubuntu 22.04 using a debian package. I get an error ModuleNotFoundError: No module named 'holoscan.graphs._graphs' when I run hello_world.py or when importing graphs in python. Am I missing something when installing the sdk using debian package?

Error:

>>> from holoscan import graphs
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/nvidia/holoscan/python/lib/holoscan/__init__.py", line 17, in <module>
    from . import cli, core, gxf
  File "/opt/nvidia/holoscan/python/lib/holoscan/core/__init__.py", line 65, in <module>
    from ..graphs._graphs import FragmentGraph, OperatorGraph
  File "/opt/nvidia/holoscan/python/lib/holoscan/graphs/__init__.py", line 24, in <module>
    from ._graphs import FragmentFlowGraph, OperatorFlowGraph
ModuleNotFoundError: No module named 'holoscan.graphs._graphs'