microsoft / onnxruntime-genai

Generative AI extensions for onnxruntime
MIT License
414 stars 94 forks source link

Consider removing direct link-time dependency on ORT on Linux/macOS #693

Closed skyline75489 closed 1 week ago

skyline75489 commented 1 month ago

Introduction

During the work of #687, I notice some issues that blocks GenAI NuGet from working on Linux. After investigation and discussion with @baijumeswani , I'll like to propose removing direct link-time dependency on ORT, to improve the overall dynamic library loading scenarios. @skottmckay has already implemented it on Android platform. This proposal is to further extend it on Linux/macOS.

The problem with link-time dependency

Again, @skottmckay has a detailed writeup at https://github.com/microsoft/onnxruntime-genai/blob/e41fb2c1430693157c3cea70d3bdcc207ae84ef3/src/models/onnxruntime_api.h#L89

I suggest checking this out first to have a basic understanding of the issue.

TLDR: ort-genai C++ library depends on a specific version of ORT. Simply having libonnxruntime.so isn't enough, It needs specifically libonnxruntime.so.1.18.0. This is the ldd output:

ldd -v libonnxruntime-genai.so
        linux-vdso.so.1 (0x00007ffecc25e000)
        libonnxruntime.so.1.18.0 => not found
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f59e6ab0000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f59e6aab000)
......

Traditionally, this should be handled by symlinks, if those libraries are installed system-wide. But we need ort-genai to be distributed in wheels and NuGets.

Wheel scenario

To make GenAI work with direct linking against ORT, a exact copy of libonnxruntime.so currently exists and renamed to libonnxruntime.so.1.18.0:

(genai) skyline@workstation ~/miniconda3/envs/genai/lib/python3.10/site-packages/onnxruntime_genai
>ls -al
total 33804
drwxr-xr-x  4 skyline skyline     4096 Jul 12 11:29 .
drwxr-xr-x 43 skyline skyline     4096 Jul 12 12:08 ..
-rw-r--r--  1 skyline skyline      711 Jul 12 11:29 __init__.py
drwxr-xr-x  2 skyline skyline     4096 Jul 12 11:29 __pycache__
-rw-r--r--  1 skyline skyline      617 Jul 12 11:29 _dll_directory.py
-rw-r--r--  1 skyline skyline 15285336 Jul 12 11:29 libonnxruntime.so
-rw-r--r--  1 skyline skyline 15285336 Jul 12 11:29 libonnxruntime.so.1.18.0
drwxr-xr-x  3 skyline skyline     4096 Jul 12 11:29 models
-rwxr-xr-x  1 skyline skyline  4015888 Jul 12 11:29 onnxruntime_genai.cpython-310-x86_64-linux-gnu.so

This is introduced by a pipeline step during packaging:

https://github.com/microsoft/onnxruntime-genai/blob/e41fb2c1430693157c3cea70d3bdcc207ae84ef3/.pipelines/stages/jobs/steps/utils/download-ort.yml#L45

I don't think this should be considered correct, but at least the ort-genai wheel works on Linux for now.

NuGet scenario

With NuGet, the same issue exists. But this time the libonnxruntime.so is directly from ORT NuGet package and the naming is still just libonnxruntime.so without any version. So ort-genai will crash because it can't find the correct library.

Solution

Proven possible by Scott's work, removing the link-time dependency and switch to dlopen will not only allow us to remove the embedded ORT libraries and fix both the wheel and NuGet scenarios mentioned above, but also provide much more flexibility that can benefit #662 and future work. The rpath or search path tweaks will be no longer needed. The searching for correct library can be handled entirely in C++.

Limitation

With the power of dlopen, it comes with price. Using dlopen means that we should only use public APIs provided by ORT headers, to prevent possible ABI breakage. We probably also need to detect if the runtime ORT version matches the one we use for building ort-genai, and warn users if a version mismatch happens.

snnn commented 1 month ago

Related: https://github.com/microsoft/onnxruntime/issues/21281

skyline75489 commented 1 month ago

Related: #273