Closed divyansh2681 closed 1 year ago
Hi @divyansh2681.
Could you please provide the result of these commands:
nvidia-ctk --version
docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:22.03-py3
docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.08-py3
Is the result the same after you update your version of nvidia-container toolkit to the latest v1.14.1 ?
Thank you for responding @agirault
Before Updating:
Sorry I missed taking a screenshot when running docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:22.03-py3
before updating.
After updating:
I don't get this issue when using the 3rd command (as seen in the last image), one with version 23.08. How can I use that as a default version when starting the container?
My initial assumption is that this means the 22.03-py3 container has packages relying on cuda (cuda runtime, cudnn, and/or tensorrt) with SASS builds for architectures that don't include the 4080. 4080 is Ada, so sm_89 which should be supported starting with 11.8.
Looks like nvcr.io/nvidia/tensorrt:23.04-py3
has CUDA 11.8 and is still on Ubuntu 20.04 base, so it would be the ideal base to switch to. Can you confirm that the following does not show the error?
docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.04-py3
If that looks good, you could create your own base container with holoscan like this (let's name this holoscan-0.6-trt-23.04.dockerfile
):
FROM nvcr.io/nvidia/tensorrt:23.04-py3
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt-get update
RUN apt-get install holoscan
# Install more things like libtorch if you need that
Then follow the instructions from the top level README to use that as the base image for holohub:
./dev_container build --docker_file /path/to/holoscan-0.6-trt-23.04.dockerfile
docker run --rm --runtime=nvidia --gpus=all nvcr.io/nvidia/tensorrt:23.04-py3
does not show any error.
When building the container, RUN apt-get install holoscan
results in the following error..
=> ERROR [5/5] RUN apt-get install holoscan 1.6s
------
> [5/5] RUN apt-get install holoscan:
0.254 Reading package lists...
1.360 Building dependency tree...
1.568 Reading state information...
1.596 E: Unable to locate package holoscan
------
holoscan-0.6-trt-23.04.dockerfile:6
--------------------
4 | RUN dpkg -i cuda-keyring_1.1-1_all.deb
5 | RUN apt-get update
6 | >>> RUN apt-get install holoscan
7 |
8 | # Install more things like libtorch if you need that
--------------------
ERROR: failed to solve: process "/bin/sh -c apt-get install holoscan" did not complete successfully: exit code: 100
Update:
I installed holoscan using pip by adding RUN pip3 install holoscan
in the dockerfile.
@agirault When using pip to install holoscan, cmake can't find the holoscan files
root@divyansh-moon:/workspace/holohub# ./run build
Building Holohub
Building sample applications.
[command] cmake -S . -B build -DPython3_EXECUTABLE=/usr/bin/python3 -DPython3_ROOT_DIR=/usr/lib/python3 -DHOLOHUB_DATA_DIR=/workspace/holohub/data -DCMAKE_BUILD_TYPE=release -DBUILD_SAMPLE_APPS=1
CMake Error at applications/colonoscopy_segmentation/CMakeLists.txt:39 (find_package):
Could not find a package configuration file provided by "holoscan"
(requested version 0.6) with any of the following names:
holoscanConfig.cmake
holoscan-config.cmake
Add the installation prefix of "holoscan" to CMAKE_PREFIX_PATH or set
"holoscan_DIR" to a directory containing one of the above files. If
"holoscan" provides a separate development package or SDK, be sure it has
been installed.
Is there any other way to install holoscan in the docker container?
Could not find a package configuration file provided by "holoscan"
To build holohub's cmake project, we'll need holoscan's cmake config which is not in the python wheel. Are you only interested in python application? Any python binding or pure python?
E: Unable to locate package holoscan
That's what we should address. I shared this:
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
Should be this to actually choose the distro/arch in your case:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
Could not find a package configuration file provided by "holoscan"
To build holohub's cmake project, we'll need holoscan's cmake config which is not in the python wheel. Are you only interested in python application? Any python binding or pure python?
E: Unable to locate package holoscan
That's what we should address. I shared this:
wget https://developer.download.nvidia.com/compute/cuda/repos/$distro/$arch/cuda-keyring_1.1-1_all.deb
Should be this to actually choose the distro/arch in your case:
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-keyring_1.1-1_all.deb
I am interested in python and C++ both.
I have already added RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
in my docker file. I still get the same error.
My docker file looks like this, I have ubuntu 22.04
FROM nvcr.io/nvidia/tensorrt:23.04-py3
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
RUN dpkg -i cuda-keyring_1.1-1_all.deb
RUN apt-get update
RUN pip3 install holoscan
# RUN apt-get install -y holoscan
what matters is the ubuntu version inside your container, not on your host. tensorrt:23.04-py3
is based on ubuntu 20.04:
$ docker run --rm --entrypoint=bash nvcr.io/nvidian/tensorrt:23.04-py3 -c 'cat /etc/os-release | grep VERSION' ─╯
VERSION="20.04.6 LTS (Focal Fossa)"
VERSION_ID="20.04"
VERSION_CODENAME=focal
Okay, I get it. The container got built with this. Thank you so much for your help!
I ran into another problem:
nvcr.io/nvidia/tensorrt:23.04-py3
has CUDA 12.1 accoridng to nvcc -V
.
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Feb__7_19:32:13_PST_2023
Cuda compilation tools, release 12.1, V12.1.66
Build cuda_12.1.r12.1/compiler.32415258_0
The /usr/local/
has 3 CUDA folders - 11.4, 11.8 and 12.1.
When I build the holohub applications using ./run build
, I get the following erros:
/usr/bin/ld: warning: libnppidei.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppig.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppicc.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libnppc.so.11, needed by /opt/nvidia/holoscan/lib/libholoscan_op_format_converter.so.0.6.0, not found (try using -rpath or -rpath-link)
Upon checking for the missing libraries, I found that CUDA 11.4 has these libraries and CUDA 11.8 does not have these. Which CUDA version is actually used by the holohub applications in the container?
I'm sorry @divyansh2681 I lead you in error, looks like TRT 23.04 only has CUDA 11.8 for HPCX but the main CUDA version is 12. Could you try with TRT 22.12 base image instead? We did not built Holoscan SDK 0.6 binaries for CUDA 12 (the upcoming SDK release will target CUDA 12 & Ubuntu 22.04 though)
I'm sorry @divyansh2681 I lead you in error, looks like TRT 23.04 only has CUDA 11.8 for HPCX but the main CUDA version is 12. Could you try with TRT 22.12 base image instead? We did not built Holoscan SDK 0.6 binaries for CUDA 12 (the upcoming SDK release will target CUDA 12 & Ubuntu 22.04 though)
Oh okay, no worries. I created another container with TRT 22.12 base image. The container got built and does not show any error related to GPU support. However, when building holohub apps using ./run build
, I get the following error:
/usr/bin/ld: warning: libcudart.so.12, needed by ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaUnregisterFatBinary@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFatBinaryEnd@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFatBinary@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaRegisterFunction@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaPopCallConfiguration@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaMemcpyAsync@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `__cudaPushCallConfiguration@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaGetErrorString@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaLaunchKernel@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaStreamSynchronize@libcudart.so.12'
/usr/bin/ld: ../../../operators/tool_tracking_postprocessor/libtool_tracking_postprocessor.so: undefined reference to `cudaEventDestroy@libcudart.so.12'
This container has CUDA 11.8 according to $ nvcc -V
, but holohub apps are looking for libcudart.so.12
. Do I need to change the CUDA version to be used somewhere in the holohub application files?
So I'm not super familiar with Holohub's container and build infrastructure yet (I usually I'm focused on the SDK itself), but did you try clearing your build directory already before rerunning? ./run clear_cache
. Previously executed builds could lead some cached cmake config to point to the CUDA 12 we had in the previous containers.
This worked, I am able to build the applications. Thank you so much!
Hi @agirault, I installed holoscan-sdk on my x86_64 machine running Ubuntu 22.04 using a debian package. I get an error ModuleNotFoundError: No module named 'holoscan.graphs._graphs'
when I run hello_world.py
or when importing graphs in python. Am I missing something when installing the sdk using debian package?
Error:
>>> from holoscan import graphs
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/nvidia/holoscan/python/lib/holoscan/__init__.py", line 17, in <module>
from . import cli, core, gxf
File "/opt/nvidia/holoscan/python/lib/holoscan/core/__init__.py", line 65, in <module>
from ..graphs._graphs import FragmentGraph, OperatorGraph
File "/opt/nvidia/holoscan/python/lib/holoscan/graphs/__init__.py", line 24, in <module>
from ._graphs import FragmentFlowGraph, OperatorFlowGraph
ModuleNotFoundError: No module named 'holoscan.graphs._graphs'
Hello, I am trying to run holohub applications on my system (Ubuntu 22.04 running on an x86_64 machine with RTX 4060 GPU). When I launch the
dev_container
, I get the following error. Does the container not work with RTX GPUs?