Add CUDA12 support for Java's onnxruntime_gpu dependency

davidecaroselli commented 7 months ago

Describe the issue

When trying to use Java's onnxruntime_gpu:1.17.1 runtime on a CUDA 12 system, the program fails to load libonnxruntime_providers_cuda.so library because it searches for CUDA 11.x dependencies.

However, this issue seems to be already solved with (nearly) all runtimes except Java AFAIK: Install ONNX Runtime.

Can this be ported to Maven Central build too, please?

To reproduce

On a system with CUDA 12.3 installed:

$ nvidia-smi

+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.54.03              Driver Version: 535.54.03    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
...

And a Java Maven project using the latest available version of onnxruntime_gpu:

<dependency>
    <groupId>com.microsoft.onnxruntime</groupId>
    <artifactId>onnxruntime_gpu</artifactId>
    <version>1.17.1</version>
</dependency>

You can reproduce the problem simply by running this Java main:

package org.example;

import ai.onnxruntime.OrtException;
import ai.onnxruntime.OrtSession;

public class App {

    public static void main(String[] args) throws OrtException {
        new OrtSession.SessionOptions().addCUDA(0);
    }

}

Resulting in thr following error:

Exception in thread "main" ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: /onnxruntime_src/onnxruntime/core/session/provider_bridge_ort.cc:1209 onnxruntime::Provider& onnxruntime::ProviderLibrary::Get() [ONNXRuntimeError] : 1 : FAIL : Failed to load library libonnxruntime_providers_cuda.so with error: libcublasLt.so.11: cannot open shared object file: No such file or directory

    at ai.onnxruntime.OrtSession$SessionOptions.addCUDA(Native Method)
    at ai.onnxruntime.OrtSession$SessionOptions.addCUDA(OrtSession.java:1009)
    at org.example.App.main(App.java:9)

Urgency

Currently development of internal library is blocked because this issue makes impossible to run any Java-ONNX project on our new deployment with newest NVIDIA GPUs (i.e. GH200) as they require the latest drivers and CUDA library.

Platform

Linux

OS Version

Ubuntu 20.04.6 LTS

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.17.1

ONNX Runtime API

Java

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 12.3

Craigacp commented 7 months ago

You can compile it from source with CUDA 12 support.

davidecaroselli commented 7 months ago

Hi @Craigacp and thanks for the advice.

I was able to compile the library from source using the attached Dockerfile, however there is an important caveat: It seems to me that ONNX runtime only supports cuDNN v8, while all latest NVIDIA CUDA images come with cuDNN v9.

If I try to compile FROM nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04, I get multiple errors like:

error: ‘cudnnSetRNNDescriptor_v6’ was not declared in this scope; did you mean ‘cudnnSetRNNDescriptor_v8’?
error: ‘cudnnSetRNNMatrixMathType’ was not declared in this scope; did you mean ‘cudnnSetConvolutionMathType’?
[...]

This is the Dockerfile I used:

FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04

RUN apt-get update && apt-get install -y --no-install-recommends python3-dev ca-certificates g++ python3-numpy gcc make git python3-setuptools python3-wheel python3-packaging python3-pip aria2 unzip wget openjdk-17-jdk && \
    aria2c -q -d /tmp -o cmake-3.27.3-linux-x86_64.tar.gz https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
    tar -zxf /tmp/cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm /tmp/cmake-3.27.3-linux-x86_64.tar.gz && \
    wget -c https://services.gradle.org/distributions/gradle-8.6-bin.zip -P /tmp && unzip /tmp/gradle-8.6-bin.zip -d /opt/ && rm /tmp/gradle-8.6-bin.zip

ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}

COPY onnxruntime /onnxruntime

RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
    git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
    ./build.sh --allow_running_as_root --skip_submodule_sync --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ \
               --use_cuda --config Release --build_shared_lib --build_java --update --build --parallel --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) 'CMAKE_CUDA_ARCHITECTURES=52;60;61;70;75;86'

So my follow-up questions are:

Are there any plans to make this build available in the official Maven Central repository ?
Are there any plans to support cuDNN 9? And/or is there any option to build ONNX runtime without cuDNN dependency?

Craigacp commented 7 months ago

cuDNN 9 came out after ORT 1.17 (https://github.com/microsoft/onnxruntime/pull/19419), so it probably won't be supported until at least the next feature release.

We're discussing what to do about CUDA 12 binaries for Java, whether to drop CUDA 11 completely or make two releases. It's not been decided yet.

davidecaroselli commented 7 months ago

Got it, thanks! I think cuDNN 9 would not be a huge problem for now as I can manually install cuDNN 8 in the docker file.

My two cents: a solution could be to create two different artifacts, like 1.17.1-cu11 and 1.17.1-cu12, you can always drop the first one as soon as you don't feel supporting it anymore.

One last problem I'm facing right now: I have just realized that the build I made on Ubuntu 22.04, won't work on Ubuntu 20.04 because of different libc.6.so version:

Caused by: java.lang.UnsatisfiedLinkError: /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.32' not found (required by /tmp/onnxruntime-java1823669597081387394/libonnxruntime.so)

As on my 20.04 machine I have /lib/x86_64-linux-gnu/libc-2.31.so. Just wondering, how did you solve this problem on the Java release? As it appears to me that the same Maven JAR works well in both versions of Ubuntu.

Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?

Craigacp commented 7 months ago

Not that I'm aware. I think the release is compiled on 20.04.

davidecaroselli commented 7 months ago

Thanks, I'll give it a try!

snnn commented 7 months ago

Is there a specific flag I can use during compilation to avoid dynamic linking to a specific version of libc?

No. If you still need to support Ubuntu 20.04, consider using RHEL/CentOS(or UBI8) with "Red Hat Developer Toolset" to compile to code.

davidecaroselli commented 7 months ago

Hi @snnn and thanks for the hint!

I did try to build onnxruntime starting from nvidia/cuda:12.1.1-cudnn8-devel-ubi8 image, however I didn't expect it to be sooo painful 😅.

After a couple of hours of trial-and-error, I was able to spot several changes to overcome many compilation problems:

Build protobuf from source and statically linking it with ONNX_USE_PROTOBUF_SHARED_LIBS=OFF.
Enforce C++17 standard with CMAKE_CXX_STANDARD=17 and CMAKE_CXX_STANDARD_REQUIRED=ON.
Create a manual symbolic link ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu as some dependency has /usr/lib/x86_64-linux-gnu hardcoded in their CMake file.
Skip unit tests build with onnxruntime_BUILD_UNIT_TESTS=OFF as many of them were failing to compile.

Despite all these precautions, I'm still not able to compile onnxruntime because of this error:

...
[ 61%] Linking CXX shared library libonnxruntime.so
[ 97%] Built target onnxruntime_providers_cuda
> Task :clean
> Task :spotlessInternalRegisterDependencies
libonnxruntime_providers.a(matmul_fpq4.cc.o): In function `onnxruntime::contrib::MatMulFpQ4::Compute(onnxruntime::OpKernelContext*) const':
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x4e2): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
matmul_fpq4.cc:(.text._ZNK11onnxruntime7contrib10MatMulFpQ47ComputeEPNS_15OpKernelContextE+0x773): undefined reference to `MlasQ4GemmBatch(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long, unsigned long, unsigned long, MLAS_Q4_GEMM_DATA_PARAMS const*, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_providers.a(matmul_nbits.cc.o): In function `onnxruntime::contrib::MatMulNBits::Compute(onnxruntime::OpKernelContext*) const':
matmul_nbits.cc:(.text._ZNK11onnxruntime7contrib11MatMulNBits7ComputeEPNS_15OpKernelContextE+0x1264): undefined reference to `void MlasDequantizeBlockwise<float, 4>(float*, unsigned char const*, float const*, unsigned char const*, int, bool, int, int, onnxruntime::concurrency::ThreadPool*)'
libonnxruntime_graph.a(contrib_defs.cc.o): In function `onnxruntime::contrib::matmulQ4ShapeInference(onnx::InferenceContext&, int, int, int, MLAS_BLK_QUANT_TYPE) [clone .constprop.883]':
contrib_defs.cc:(.text._ZN11onnxruntime7contribL22matmulQ4ShapeInferenceERN4onnx16InferenceContextEiii19MLAS_BLK_QUANT_TYPE.constprop.883+0x2e8): undefined reference to `MlasQ4GemmPackBSize(MLAS_BLK_QUANT_TYPE, unsigned long, unsigned long)'
libonnxruntime_mlas.a(platform.cpp.o): In function `MLAS_PLATFORM::MLAS_PLATFORM()':
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x574): undefined reference to `MlasFpQ4GemmDispatchAvx512'
platform.cpp:(.text._ZN13MLAS_PLATFORMC2Ev+0x5b1): undefined reference to `MlasQ8Q4GemmDispatchAvx512vnni'
collect2: error: ld returned 1 exit status
gmake[2]: *** [CMakeFiles/onnxruntime.dir/build.make:172: libonnxruntime.so.1.17.1] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:2113: CMakeFiles/onnxruntime.dir/all] Error 2
...

...and at this point I'm out of ideas on why it's failing...

Here's the Dockerfile I created so far:

FROM nvidia/cuda:12.1.1-cudnn8-devel-ubi8

ENV DEBIAN_FRONTEND=noninteractive

COPY onnxruntime /onnxruntime

RUN yum install -y zlib-devel python39-devel python39-numpy python39-setuptools python39-wheel python39-pip git unzip wget java-1.8.0-devel && \
    wget https://github.com/Kitware/CMake/releases/download/v3.27.3/cmake-3.27.3-linux-x86_64.tar.gz && \
    tar -zxf cmake-3.27.3-linux-x86_64.tar.gz --strip=1 -C /usr && rm -f cmake-3.27.3-linux-x86_64.tar.gz && \
    wget https://services.gradle.org/distributions/gradle-8.6-bin.zip && unzip gradle-8.6-bin.zip -d /opt/ && rm -f gradle-8.6-bin.zip

RUN git clone https://github.com/protocolbuffers/protobuf.git && cd protobuf && git checkout v21.12 && git submodule update --init --recursive && mkdir build_source && cd build_source && \
    cmake ../cmake  -DCMAKE_INSTALL_LIBDIR=lib64 -Dprotobuf_BUILD_SHARED_LIBS=OFF -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_SYSCONFDIR=/etc -DCMAKE_POSITION_INDEPENDENT_CODE=ON -Dprotobuf_BUILD_TESTS=OFF -DCMAKE_BUILD_TYPE=Release && \
    make -j$(nproc) && make install

ENV GRADLE_HOME=/opt/gradle-8.6
ENV PATH=${GRADLE_HOME}/bin:${PATH}

RUN git config --global --add safe.directory /onnxruntime && cd /onnxruntime && git checkout -- . && git clean -fd . && \
    git checkout v1.17.1 && python3 -m pip install -r tools/ci_build/github/linux/docker/inference/x64/python/cpu/scripts/requirements.txt && \
    ln -s /usr/lib64 /usr/lib/x86_64-linux-gnu && ./build.sh --allow_running_as_root --skip_submodule_sync --compile_no_warning_as_error --skip_tests \
    --use_cuda --cuda_home /usr/local/cuda --cudnn_home /usr/lib64/ --config Release --build_java --update --build --parallel --cmake_extra_defines \
    ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER) CMAKE_CUDA_ARCHITECTURES="52;60;61;70;75;86" CMAKE_CXX_STANDARD=17 CMAKE_CXX_STANDARD_REQUIRED=ON \
    ONNX_USE_PROTOBUF_SHARED_LIBS=OFF onnxruntime_BUILD_UNIT_TESTS=OFF

davidecaroselli commented 7 months ago

Update: I was (finally) able to build onnxruntime on *-ubi8 image by:

Removing onnxruntime_mlas_q4dq target (it failed for pthread problems) by changing this line with a simple if (FALSE): https://github.com/microsoft/onnxruntime/blob/4c6a6a37f77dae7b54a826527a0d688c7ca46834/cmake/onnxruntime_mlas.cmake#L658
Build script was not able to find JNI headers even if JAVA_HOME was properly set, so I forced those files like this:
```
for f in $(find $JAVA_HOME -name "*.h"); do ln -s $f /usr/include/$(basename $f); done
```

This is the final Dockerfile used to build onnxruntime_gpu:1.17.1-cu12: Dockerfile.ubi8

Would you accept a PR for this? If yes, do you see a more proper way to skip onnxruntime_mlas_q4dq build?

tianleiwu commented 7 months ago

Would you accept a PR for this? If yes, do you see a more proper way to skip onnxruntime_mlas_q4dq build?

Feel free to contribute a PR. I think you can add a build flag like onnxruntime_BUILD_MLAS_Q4DQ (example). Then replace the line to if (onnxruntime_BUILD_MLAS_Q4DQ)

lanking520 commented 6 months ago

Hi, do we have any updates for CUDA 12 support for ONNXRuntime Java?

davidecaroselli commented 6 months ago

Hi @lanking520 ! Unfortunately my PR (#20011) is blocked waiting for someone to review it. Still you can build it directly from my fork: the code is tested and I currently have the build in production in my environment.

@snnn do you have any update on the PR? Is there anything I can do to facilitate its merge? Thank you!

jchen351 commented 5 months ago

It is enabled with competition of #20583, and will be release with along with Onnxruntime 1.18

microsoft / onnxruntime