microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.43k stars 2.9k forks source link

[Build] OnnxRuntime `--skip_tests` flag doesn't effect `onnxruntime_test_all` target #17266

Closed mc-nv closed 10 months ago

mc-nv commented 1 year ago

Describe the issue

Trying to build OnnxRuntime on Jetson device but keep failing on compilation due to dynamic linking against CUDA runtime.

I have tried to disable that target with --skip_tests but it didn't works

Urgency

ASAP

Target platform

aarch64

Build script

./build.sh --config Release --skip_submodule_sync --parallel --build_shared_lib --build_dir /root/triton/onnxruntime/build --update --build --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib/aarch64-linux-gnu --allow_running_as_root --use_tensorrt --tensorrt_home /usr/src/tensorrt --compile_no_warning_as_error --skip_tests --cmake_extra_defines 'CMAKE_CUDA_ARCHITECTURES=53;62;72;87'

Error / output

[100%] Linking CXX executable onnxruntime_test_all
/usr/bin/ld: warning: libnvcudla.so, needed by /usr/local/cuda/lib64/libcudla.so.1, not found (try using -rpath or -rpath-link)
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvSetTaskTimeoutInMs'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvCreateDevice'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvDeviceGetCount'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvDeviceGetAttribute'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvImportExternalMemory'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvGetNvSciSyncAttributes'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvGetExternalExportTable'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvModuleGetAttributes'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvModuleUnload'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvGetExportTable'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvSubmitTask'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvDestroyDevice'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvModuleLoadFromMemory'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvMemRegister'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvMemUnregister'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvGetLastError'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvImportExternalSemaphore'
/usr/bin/ld: /usr/local/cuda/lib64/libcudla.so.1: undefined reference to `cudlaDrvGetVersion'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/onnxruntime_test_all.dir/build.make:4803: onnxruntime_test_all] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2311: CMakeFiles/onnxruntime_test_all.dir/all] Error 2
gmake: *** [Makefile:166: all] Error 2

Visual Studio Version

No response

GCC / Compiler Version

CMake 3.27.1

mc-nv commented 1 year ago

cc: @pranavsharma , @snnn

jywu-msft commented 1 year ago

+@yf711 who successfully built ort on jetson recently.

yf711 commented 1 year ago

Hi @mc-nv quick question are you using the latest jetpack?

mc-nv commented 1 year ago

Hi @mc-nv quick question are you using the latest jetpack?

yes, I'm using latest JetPack version

jywu-msft commented 1 year ago

Hi @mc-nv quick question are you using the latest jetpack?

yes, I'm using latest JetPack version

what cuda version? and i assume you're building latest ORT main branch? (which commit)

mc-nv commented 1 year ago

I'm building against CUDA 12 and ORT 1.15 so far

jywu-msft commented 1 year ago

I'm building against CUDA 12 and ORT 1.15 so far

the first message is interesting "/usr/bin/ld: warning: libnvcudla.so, needed by /usr/local/cuda/lib64/libcudla.so.1, not found (try using -rpath or -rpath-link)" does libnvcudla.so exist somewhere? maybe it's finding a different cuda version? we haven't validated cuda 12 on jetson (or any version other than the default that comes with JetPack , which is 11.4.x) so that's an area we will need to try out.

mc-nv commented 1 year ago

I'm building against CUDA 12 and ORT 1.15 so far

the first message is interesting "/usr/bin/ld: warning: libnvcudla.so, needed by /usr/local/cuda/lib64/libcudla.so.1, not found (try using -rpath or -rpath-link)" does libnvcudla.so exist somewhere? maybe it's finding a different cuda version? we haven't validated cuda 12 on jetson (or any version other than the default that comes with JetPack , which is 11.4.x) so that's an area we will need to try out.

Missed runtime library isn't bothering me. I'm wondering why using flag --skip_tests continue to compile onnxruntime_test_all and how to avoid it?

fs-eire commented 1 year ago

in build.py, flag --skip_tests means "skip running the tests", but the test targets ( including onnxruntime_test_all ) still builds.

if you want to build libonnxruntime.so only, you can try --target onnxruntime

skottmckay commented 1 year ago

If you want to avoid building the tests, --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF

fs-eire commented 1 year ago

If you want to avoid building the tests, --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF

Is it a good idea to include --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF implicitly when --skip_tests is specified?

mc-nv commented 1 year ago

If you want to avoid building the tests, --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF

Had tried this approach, but keep failing with same error, it doesn't disable target

mszhanyi commented 1 year ago

Could you try removing these lines directly? https://github.com/microsoft/onnxruntime/blob/bb1871332f5e37ebaa6a508fed460ab836fb23c5/cmake/CMakeLists.txt#L1634-L1636

In fact, I think --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF should work. Did you clean the build or remove the cmakecache.txt before rebuilding.

mszhanyi commented 1 year ago

If you want to avoid building the tests, --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF

Is it a good idea to include --cmake_extra_defines onnxruntime_BUILD_UNIT_TESTS=OFF implicitly when --skip_tests is specified?

No. For CI, it'd better to divide building and testing sometimes. For example, run building in CPU machine and run testing in CUDA machine.

yf711 commented 1 year ago

Hi @mc-nv I reproduced your issue on my local jetson device, and here's what I did to fix the missing lib:

  1. If there's no libnvcudla.so under /usr/local/cuda-12.2/compat: sudo apt-get install -y cuda-compat-12-2
  2. Add this lib path to env: export LD_LIBRARY_PATH="/usr/local/cuda-12.2/lib64:/usr/local/cuda-12.2/compat:$LD_LIBRARY_PATH"

More detail of cuDLA: https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html#cudla

mc-nv commented 11 months ago

I'm facing issue to compile OnnxRuntime, using Dockerfile. During the docker build command you can't pass driver and compile it in runtime with CUDA.

yf711 commented 11 months ago

@mc-nv can you share more detail about your Dockerfile and error logs in your docker env? If you receive error logs of your docker env same as posted above, I wonder if adding necessary cuda dependencies installation and path to lib in your dockerfile before compiling ONNXRT could help your case

jywu-msft commented 11 months ago

I'm facing issue to compile OnnxRuntime, using Dockerfile. During the docker build command you can't pass driver and compile it in runtime with CUDA.

are we still talking about Jetson here, or is this a new issue

mc-nv commented 11 months ago

Yes we still talking about Jetson device. I'm trying to compile OnnxRuntime on Jetson device using Docker image.
But struggling due to the issue with the libraries presence which are required only for runtime. Per my review it should be possible if we can disable the unit tests, but by some reason it not possible with a given instructions to wrapper.

yf711 commented 11 months ago

@mc-nv could you share a dockerfile and full command that you used to repro the issue?

I have tested building ort with --skip_tests --cmake_extra_defines 'onnxruntime_BUILD_UNIT_TESTS=OFF' in jetson jetpack native env without docker, and onnxruntime_test_all wasn't generated

p.s Apart from cmake args above, I followed these steps to deploy my jetpack env and build ort: https://onnxruntime.ai/docs/build/eps.html#nvidia-jetson-tx1tx2nanoxavier, which was running without root privilege.

mc-nv commented 10 months ago

Ticket can be closed I was able to build ORT using ARM machine without tests. My mistake was is that i stick to the Mac ARM , which break my build.