Closed davidecaroselli closed 5 months ago
You can compile it from source with CUDA 12 support.
Hi @Craigacp and thanks for the advice.
I was able to compile the library from source using the attached Dockerfile, however there is an important caveat: It seems to me that ONNX runtime only supports cuDNN v8, while all latest NVIDIA CUDA images come with cuDNN v9.
If I try to compile FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04
, I get multiple errors like:
error: ‘cudnnSetRNNDescriptor_v6’ was not declared in this scope; did you mean ‘cudnnSetRNNDescriptor_v8’?
error: ‘cudnnSetRNNMatrixMathType’ was not declared in this scope; did you mean ‘cudnnSetConvolutionMathType’?
[...]
This is the Dockerfile I used:
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
RUN apt-get update && apt-get install -y --no-install-recommends python3-dev ca-certificates g++ python3-numpy gcc make git python3-setuptools python3-wheel python3-packaging python3-pip aria2 unzip wget openjdk-17-jdk && \
aria2c -q -d /tmp -o cmake-3.27.3-linux-x86_64.tar.gz https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
tar -zxf /tmp/cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm /tmp/cmake-3.27.3-linux-x86_64.tar.gz && \
wget -c https://services.gradle.org/distributions/gradle-8.6-bin.zip -P /tmp && unzip /tmp/gradle-8.6-bin.zip -d /opt/ && rm /tmp/gradle-8.6-bin.zip
ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}
COPY onnxruntime /onnxruntime
RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ \
--use_cuda --config Release --build_shared_lib --build_java --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86'
So my follow-up questions are:
cuDNN 9 came out after ORT 1.17 (https://github.com/microsoft/onnxruntime/pull/19419), so it probably won't be supported until at least the next feature release.
We're discussing what to do about CUDA 12 binaries for Java, whether to drop CUDA 11 completely or make two releases. It's not been decided yet.
Got it, thanks! I think cuDNN 9 would not be a huge problem for now as I can manually install cuDNN 8 in the docker file.
My two cents: a solution could be to create two different artifacts, like 1.17.1-cu11
and 1.17.1-cu12
, you can always drop the first one as soon as you don't feel supporting it anymore.
One last problem I'm facing right now: I have just realized that the build I made on Ubuntu 22.04, won't work on Ubuntu 20.04 because of different libc.6.so
version:
Caused by: java.lang.UnsatisfiedLinkError: /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so)
As on my 20.04 machine I have /lib/x86_64-linux-gnu/libc-2.31.so
. Just wondering, how did you solve this problem on the Java release? As it appears to me that the same Maven JAR works well in both versions of Ubuntu.
Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?
Not that I'm aware. I think the release is compiled on 20.04.
Thanks, I'll give it a try!
Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?
No. If you still need to support Ubuntu 20.04, consider using RHEL/CentOS(or UBI8) with "Red Hat Developer Toolset" to compile to code.
Hi @snnn and thanks for the hint!
I did try to build onnxruntime starting from nvidia/cuda:12.1.1-cudnn8-devel-ubi8
image, however I didn't expect it to be sooo painful 😅.
After a couple of hours of trial-and-error, I was able to spot several changes to overcome many compilation problems:
protobuf
from source and statically linking it with ONNX_USE_PROTOBUF_SHARED_LIBS=OFF
.C++17
standard with CMAKE_CXX_STANDARD=17
and CMAKE_CXX_STANDARD_REQUIRED=ON
.ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu
as some dependency has /usr/lib/x86_64-linux-gnu
hardcoded in their CMake file.onnxruntime_BUILD_UNIT_TESTS=OFF
as many of them were failing to compile.Despite all these precautions, I'm still not able to compile onnxruntime because of this error:
...
[ 61%] Linking CXX shared library libonnxruntime.so
[ 97%] Built target onnxruntime_providers_cuda
> Task :clean
> Task :spotlessInternalRegisterDependencies
libonnxruntime_providers.a(matmul_fpq4.cc.o): In function `onnxruntime::contrib::MatMulFpQ4::Compute(onnxruntime::OpKernelContext*) const':
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x4e2): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x773): undefined reference to `MlasQ4GemmBatch(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long, unsigned long, unsigned long, MLAS_Q4_GEMM_DATA_PARAMS const*, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_providers.a(matmul_nbits.cc.o): In function `onnxruntime::contrib::MatMulNBits::Compute(onnxruntime::OpKernelContext*) const':
matmul_nbits.cc:(.text._ZNK11onnxruntime7contrib11MatMulNBits7ComputeEPNS_15OpKernelContextE+0x1264): undefined reference to `void MlasDequantizeBlockwise<float, 4>(float*, unsigned char const*, float const*, unsigned char const*, int, bool, int, int, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_graph.a(contrib_defs.cc.o): In function `onnxruntime::contrib::matmulQ4ShapeInference(onnx::InferenceContext&, int, int, int, MLAS_BLK_QUANT_TYPE) [clone .constprop.883]':
contrib_defs.cc:(.text._ZN11onnxruntime7contribL22matmulQ4ShapeInferenceERN4onnx16InferenceContextEiii19MLAS_BLK_QUANT_TYPE.constprop.883+0x2e8): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
libonnxruntime_mlas.a(platform.cpp.o): In function `MLAS_PLATFORM::MLAS_PLATFORM()':
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x574): undefined reference to `MlasFpQ4GemmDispatchAvx512'
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x5b1): undefined reference to `MlasQ8Q4GemmDispatchAvx512vnni'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/onnxruntime.dir/build.make:172: libonnxruntime.so.1.17.1] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2113: CMakeFiles/onnxruntime.dir/all] Error 2
...
...and at this point I'm out of ideas on why it's failing...
Here's the Dockerfile I created so far:
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubi8
ENV DEBIAN_FRONTEND=noninteractive
COPY onnxruntime /onnxruntime
RUN yum install -y zlib-devel python39-devel python39-numpy python39-setuptools python39-wheel python39-pip git unzip wget java-1.8.0-devel && \
wget https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
tar -zxf cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm -f cmake-3.27.3-linux-x86_64.tar.gz && \
wget https://services.gradle.org/distributions/gradle-8.6-bin.zip && unzip gradle-8.6-bin.zip -d /opt/ && rm -f gradle-8.6-bin.zip
RUN git clone https://github.com/protocolbuffers/protobuf.git && cd protobuf && git checkout v21.12 && git submodule update --init --recursive && mkdir build_source && cd build_source && \
cmake ../cmake -DCMAKE_INSTALL_LIBDIR=lib64 -Dprotobuf_BUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_POSITION_INDEPENDENT_CODE=ON -Dprotobuf_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release && \
make -j$(nproc) && make install
ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}
RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu && ./build.sh --allow_running_as_root --skip_submodule_sync --compile_no_warning_as_error --skip_tests \
--use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib64/ --config Release --build_java --update --build --parallel --cmake_extra_defines \
ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) CMAKE_CUDA_ARCHITECTURES="52;60;61;70;75;86" CMAKE_CXX_STANDARD=17 CMAKE_CXX_STANDARD_REQUIRED=ON \
ONNX_USE_PROTOBUF_SHARED_LIBS=OFF onnxruntime_BUILD_UNIT_TESTS=OFF
Update: I was (finally) able to build onnxruntime
on *-ubi8
image by:
Removing onnxruntime_mlas_q4dq
target (it failed for pthread problems) by changing this line with a simple if (FALSE)
:
https://github.com/microsoft/onnxruntime/blob/4c6a6a37f77dae7b54a826527a0d688c7ca46834/cmake/onnxruntime_mlas.cmake#L658
Build script was not able to find JNI headers even if JAVA_HOME
was properly set, so I forced those files like this:
for f in $(find $JAVA_HOME -name "*.h"); do ln -s $f /usr/include/$(basename $f); done
This is the final Dockerfile used to build onnxruntime_gpu:1.17.1-cu12
: Dockerfile.ubi8
Would you accept a PR for this? If yes, do you see a more proper way to skip onnxruntime_mlas_q4dq
build?
Would you accept a PR for this? If yes, do you see a more proper way to skip
onnxruntime_mlas_q4dq
build?
Feel free to contribute a PR. I think you can add a build flag like onnxruntime_BUILD_MLAS_Q4DQ
(example). Then replace the line to if (onnxruntime_BUILD_MLAS_Q4DQ)
Hi, do we have any updates for CUDA 12 support for ONNXRuntime Java?
Hi @lanking520 ! Unfortunately my PR (#20011) is blocked waiting for someone to review it. Still you can build it directly from my fork: the code is tested and I currently have the build in production in my environment.
@snnn do you have any update on the PR? Is there anything I can do to facilitate its merge? Thank you!
It is enabled with competition of #20583, and will be release with along with Onnxruntime 1.18
Describe the issue
When trying to use Java's
onnxruntime_gpu:1.17.1
runtime on a CUDA 12 system, the program fails to loadlibonnxruntime_providers_cuda.so
library because it searches for CUDA 11.x dependencies.However, this issue seems to be already solved with (nearly) all runtimes except Java AFAIK: Install ONNX Runtime.
Can this be ported to Maven Central build too, please?
To reproduce
On a system with CUDA 12.3 installed:
And a Java Maven project using the latest available version of
onnxruntime_gpu
:You can reproduce the problem simply by running this Java main:
Resulting in thr following error:
Urgency
Currently development of internal library is blocked because this issue makes impossible to run any Java-ONNX project on our new deployment with newest NVIDIA GPUs (i.e. GH200) as they require the latest drivers and CUDA library.
Platform
Linux
OS Version
Ubuntu 20.04.6 LTS
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.17.1
ONNX Runtime API
Java
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 12.3