microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.64k stars 2.93k forks source link

[Build] Issues with CUDA 11.4 and ONNX Runtime 1.11.0 #19631

Open HShamimGEHC opened 8 months ago

HShamimGEHC commented 8 months ago

Describe the issue

onnxruntime:Default, provider_bridge_ort.cc:1022 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.10: cannot open shared object file: No such file or directory

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Urgency

Urgent

Target platform

Docker on NVIDIA Jetson AGX Xavier

Build script

RUN wget https://nvidia.box.com/shared/static/2sv2fv1wseihaw8ym0d4srz41dzljwxh.whl -O onnxruntime_gpu-1.11.0-cp38-cp38-linux_aarch64.whl && \ pip3 install onnxruntime_gpu-1.11.0-cp38-cp38-linux_aarch64.whl

Install CUDA toolkit

RUN apt-get update && apt-get install -y cuda-toolkit-11-4 && rm -rf /var/lib/apt/lists/*

I was provided a model.onnx that I am trying to load so that I can run inferencing. I was just provided this model.onnx. No

Error / output

onnxruntime:Default, provider_bridge_ort.cc:1022 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcublas.so.10: cannot open shared object file: No such file or directory

[W:onnxruntime:Default, onnxruntime_pybind_state.cc:552 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

Visual Studio Version

No response

GCC / Compiler Version

No response

### Tasks
tianleiwu commented 8 months ago

From the error message, the wheel was built with CUDA 10.

Please follow the following to install proper version of Jetpack (that will install matched version of CUDA): https://elinux.org/Jetson_Zoo#ONNX_Runtime For example, 1.11 matches with JetPack 4.4 / 4.4.1 / 4.5 / 4.5.1 / 4.6 / 4.6.1.

For build, please take a look at document: https://onnxruntime.ai/docs/build/eps.html#nvidia-jetson-tx1tx2nanoxavier

HShamimGEHC commented 8 months ago

Because this is on a docker, is the following acceptable:

Without changing the JetPack version, download CUDA 10.0 with a Dockerfile?

My current JetPack version is: JetPack 5.1.2 and l4T version is 35.4.1, but I am trying to do all this on a Docker Container

tianleiwu commented 8 months ago

The doc mentioned that CUDA version 11.8 with JetPack 5.1.2 has been tested on Jetson when building ONNX Runtime 1.16.

I guess the docker container r35.4.1 has CUDA 11.4. In that case, you can try onnxruntime-gpu 1.16 or 1.17.

HShamimGEHC commented 8 months ago

I see. I decided to use 1.16 and was wondering if my Dockerfile starts out with:

FROM nvcr.io/nvidia/l4t-base:35.4.1 and it should contain CUDA11.4,

why did I still have to: RUN apt-get update && apt-get install -y cuda-toolkit-11-4 && rm -rf /var/lib/apt/lists/*

to bypass these first set of errors regarding libcublas...?

HShamimGEHC commented 8 months ago

I should also have cuDNN 8.6.0 but my next set of error is this:

2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

jywu-msft commented 8 months ago

@yf711 can you advise?

davidlee8086 commented 8 months ago

I used to follow this post to deploy docker container for jetson. Please let me know if you could deploy env that can fit your cuda/cudnn requirement. Thanks!

yf711 commented 8 months ago

Hi @HShamimGEHC, https://github.com/dusty-nv/jetson-containers there's a wide variety of containers designed for jetson, feel free to pick one which works on your case.

HShamimGEHC commented 8 months ago

Hi @HShamimGEHC, https://github.com/dusty-nv/jetson-containers there's a wide variety of containers designed for jetson, feel free to pick one which works on your case.

Hi @yf711, sure I can give that a try. Since I need onnxrt and cuda and cudnn, how can I, after downloading them, use them the docker file that I am trying to create? If you could provide some insight onto that, I would greatly appreciate it.

HShamimGEHC commented 8 months ago

I used to follow this post to deploy docker container for jetson. Please let me know if you could deploy env that can fit your cuda/cudnn requirement. Thanks!

Hi @davidlee8086 - I was trying to follow this but if I plan to install each of the containers I need separately, I find myself running out of storage on my Jetson AGX Xavier.

jywu-msft commented 8 months ago

I should also have cuDNN 8.6.0 but my next set of error is this:

2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

HShamimGEHC commented 8 months ago

I should also have cuDNN 8.6.0 but my next set of error is this: 2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

I checked and it wasn't there. I decided to scroll through the NVIDIA NGC Page and stumbled on this dockerfile: https://gitlab.com/nvidia/container-images/l4t-jetpack/-/blob/master/Dockerfile.jetpack?ref_type=heads

(It includes commands for downloading CUDA and cudnn). This solved my issues of not finding any CUDA or CUDNN related libraries.

The last follow up I have is in regard to onnxruntime. How should I know which onnxruntime to download from the Jetson Zoo link: https://elinux.org/Jetson_Zoo#ONNX_Runtime? Should I just use the one that corresponds to the version of Jetpack SDK that my Jetson Xavier is on?

jywu-msft commented 8 months ago

I should also have cuDNN 8.6.0 but my next set of error is this: 2024-02-23 22:55:09.263697843 [E:onnxruntime:Default, provider_bridge_ort.cc:1480 TryGetProviderInfo_CUDA] /home/ort/onnxruntime/onnxruntime/core/session/provider_bridge_ort.cc:1193 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcudnn.so.8: cannot open shared object file: No such file or directory

is libcudnn.so in your LD_LIBRARY_PATH?

I checked and it wasn't there. I decided to scroll through the NVIDIA NGC Page and stumbled on this dockerfile: https://gitlab.com/nvidia/container-images/l4t-jetpack/-/blob/master/Dockerfile.jetpack?ref_type=heads

(It includes commands for downloading CUDA and cudnn). This solved my issues of not finding any CUDA or CUDNN related libraries.

The last follow up I have is in regard to onnxruntime. How should I know which onnxruntime to download from the Jetson Zoo link: https://elinux.org/Jetson_Zoo#ONNX_Runtime? Should I just use the one that corresponds to the version of Jetpack SDK that my Jetson Xavier is on?

yes, using the package corresponding to the JetPack version is the best option. otherwise, you will need to bring in the appropriate dependencies.