microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
13.65k stars 2.78k forks source link

ONNXRuntime in Docker #15652

Open Birken666 opened 1 year ago

Birken666 commented 1 year ago

Description

I'm trying to use Onnxruntime inside a Docker container. The base image is l4t-r32 (from docker hub /r/stereolabs/zed/, Cuda 10.2) and so I installed onnxruntime 1.6.0 using binaries from Jetson Zoo. However, when trying to import onnxruntime, I get the following error:

ImportError: cannot import name 'get_all_providers'

I also tried with onnxruntime 1.10.0, it installs and imports fine but I'm unable to use GPU:

import onnxruntime as rt
rt.get_device()   # 'GPU'

model = rt.InferenceSession(model.onnx, None, providers=['CUDAExecutionProvider', 'CPUExecutionProvider'])  
# Note: results in warning, Failed to create CUDAExecutionProvider. Please 
reference https://onnxruntime.ai/docs/execution-providers/TensorRT-ExecutionProvider.html#requirements to ensure all dependencies are met.

model.get_providers()  # 'CPUExecutionProvider'

I’m assuming that the latter breaks because onnxruntime 1.10 is not the correct version for Cuda 10.2. However, I don’t know how to fix the ImportError when using 1.6. My best guess is that the issue has something to do with cuDNN, but from my understanding the base image of l4t-r32.4 should come with cuDNN installed?

Environment

TensorRT Version: None, don't know how to install in an arm64 docker environment CUDA Version: 10.2 CUDNN Version: Operating System + Version: Ubuntu 18.04 Python Version (if applicable): 3.6 Baremetal or Container (if container which image + tag): 3.8-py-devel-l4t-r32.4

xadupre commented 1 year ago

Is it possible to use a more recent onnxruntime (1.8.1)? The release note may be help choosing the right version for your environment: https://github.com/microsoft/onnxruntime/releases/tag/v1.8.1. You can also check with rt.get_available_providers() the list of providers supported by the version you installed.

Birken666 commented 1 year ago

Is it possible to use a more recent onnxruntime (1.8.1)?

Good call! I based my initial selection of 1.6 off of these requirements but it seems as if 1.8.1 should be compatible. When using pre-built 1.8 wheels from Jetson Zoo I got the following error when using rt.InferenceSession(model, None, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']):

[E:onnxruntime:Default, provider_bridge_ort.cc:953 Get] Failed to load library libonnxruntime_providers_cuda.so with error: libcudart.so.10.2: cannot open shared object file: No such file or directory
[W:onnxruntime:Default, provider_bridge_ort.cc:1056 GetProviderInfo_CUDA] GetProviderInfo_CUDA called, returning nullptr
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
        self._create_inference_session(providers, provider_options, disabled_optimizers)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 321, in _create_inference_session
        sess.initialize_session(providers, provider_options, disabled_optimizers)
RuntimeError: /home/onnxruntime/onnxruntime-py36/onnxruntime/core/framework/provider_bridge_ort.cc:1057 ProviderInfo_CUDA* onnxruntime::GETProviderInfo_CUDA() CUDA Provider not available, can't get interface for it

Which is weird since rt.get_available_providers() returns ['CUDAExecutionProvider', 'CPUExecutionProvider']. I ran the docker image using --runtime nvidia --gpus all so it should be able to access CUDA. Maybe something with the paths? In the meantime I'll try installing Onnxruntime from source in the docker environment instead of using pre-built wheels from Jetson Zoo and see if that changes anything.

Update Didn't work with building from source, ./build.sh --build_wheel --config Release --use_cuda --cuda_home /usr/local/cuda-10.2 --cudnn_home /usr/lib/aarch64-linux-gnu resulted in fatal error: cublas_v2.h no such file or directory during build. Again, it could be because of cuDNN but I have no idea on how to manage it in a jetson docker container as nvidia only seem to provide .aarch64/arm64 distributions through their SDK Manager.

Cats-eat-peppercorns commented 1 year ago

Good call!

Hi, I'm facing the same problem, I want to install onnxruntime1.6.0 for cuda10.2 in my nx 4.6 environment, but I don't know how to do it, I get an error from the source code compilation but I can't solve it. Have you solved this problem? Thank you very much!

Birken666 commented 1 year ago

Unfortunately not, still looking for a solution

Cats-eat-peppercorns commented 1 year ago

Unfortunately not, still looking for a solution

我从 jetson zoo中找到了whl文件,用它可以编译成功。但是这样的话我们就只能使用python了。。我不知道这样得到的文件在c++中是否可行。

Cats-eat-peppercorns commented 1 year ago

Unfortunately not, still looking for a solution

I found the whl file from jetson zoo, and it compiles with it. But then we would have to use python. I don't know if this will work in c++.

seddonm1 commented 1 year ago

Hi, try this https://gist.github.com/seddonm1/5927db05cb7ad38d98a22674fa82a4c6

storm12t48 commented 1 year ago

Hi, try this https://gist.github.com/seddonm1/5927db05cb7ad38d98a22674fa82a4c6

thx for script i ll try that