Failed to load library ./libonnxruntime_providers_shared.so

mkserge commented 3 years ago

Hello,

I am building onnxruntime with tensorrt execution provider support from scratch following the Dockerfile below

FROM nvcr.io/nvidia/tensorrt:21.03-py3 as onnxruntime

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime
ARG ONNXRUNTIME_BRANCH=master

RUN --mount=type=cache,id=apt-dev,target=/var/cache/apt \
    apt-get update &&\
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
        sudo \
        git \
        bash \
        wget \
        zip \
        ca-certificates \
        build-essential \
        curl \
        libcurl4-openssl-dev \
        libssl-dev

WORKDIR /code

ENV PATH /code/cmake-3.14.3-Linux-x86_64/bin:/opt/miniconda/bin:${PATH}

RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh --no-check-certificate &&\
    /bin/bash ~/miniconda.sh -b -p /opt/miniconda &&\
    rm ~/miniconda.sh &&\
    /opt/miniconda/bin/conda clean -ya

RUN pip install --upgrade pip numpy &&\
    rm -rf /opt/miniconda/pkgs

RUN wget --quiet https://github.com/Kitware/CMake/releases/download/v3.14.3/cmake-3.14.3-Linux-x86_64.tar.gz &&\
    tar zxf cmake-3.14.3-Linux-x86_64.tar.gz &&\
    rm -rf cmake-3.14.3-Linux-x86_64.tar.gz

# Prepare onnxruntime repository & build onnxruntime with TensorRT
RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
    cd onnxruntime &&\
    /bin/sh ./build.sh --parallel --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_tensorrt --tensorrt_home /workspace/tensorrt --config Release --build_wheel --update --build --cmake_extra_defines ONNXRUNTIME_VERSION=$(cat ./VERSION_NUMBER)

After the build, I simply install the wheel with pip, start a Python shell and import onnxruntime, which results in the following.

root@40902e4b21e6:/# python
Python 3.8.5 (default, Sep  4 2020, 07:30:14)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import onnxruntime
2021-04-28 19:12:04.372888000 [E:onnxruntime:Default, provider_bridge_ort.cc:566 Ensure] Failed to load library ./libonnxruntime_providers_shared.so with error: ./libonnxruntime_providers_shared.so: cannot open shared object file: No such file or directory
2021-04-28 19:12:04.372937700 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:2099 pybind11_init_onnxruntime_pybind11_state] Init provider bridge failed.
>>>

Any idea what is happening here?

Here's the missing library in the filesystem

root@40902e4b21e6:/# find . -name libonnxruntime_providers_shared.so
./opt/miniconda/lib/python3.8/site-packages/onnxruntime/capi/libonnxruntime_providers_shared.so
./code/onnxruntime/build/Linux/Release/build/lib/onnxruntime/capi/libonnxruntime_providers_shared.so
./code/onnxruntime/build/Linux/Release/onnxruntime/capi/libonnxruntime_providers_shared.so
./code/onnxruntime/build/Linux/Release/libonnxruntime_providers_shared.so

And here's the list of installed packages

root@40902e4b21e6:/# pip list
Package                  Version
------------------------ -------------------
brotlipy                 0.7.0
certifi                  2020.6.20
cffi                     1.14.3
chardet                  3.0.4
conda                    4.9.2
conda-package-handling   1.7.2
cryptography             3.2.1
flatbuffers              1.12
idna                     2.10
numpy                    1.20.2
onnxruntime-gpu-tensorrt 1.7.0
pip                      21.0.1
protobuf                 3.15.8
pycosat                  0.6.3
pycparser                2.20
pyOpenSSL                19.1.0
PySocks                  1.7.1
requests                 2.24.0
ruamel-yaml              0.15.87
setuptools               50.3.1.post20201107
six                      1.15.0
tqdm                     4.51.0
urllib3                  1.25.11
wheel                    0.35.1
root@40902e4b21e6:/#

Please note that the container is started on MacOS with no GPUs (although I intend to run it later on GPUs of course)

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Docker Container nvcr.io/nvidia/tensorrt:21.03-py3 (I believe it's ubuntu 16.04 inside)
ONNX Runtime installed from (source or binary): Source, master branch
ONNX Runtime version:
Python version: 3.8.5
Visual Studio version (if applicable): N/A
GCC/Compiler version (if compiling from source): 7.3.0
CUDA/cuDNN version: 11.2.1/8.1.1
GPU model and memory: Running locally on MacOS inside the container, so no GPU, but I intend to run these containers on V100/A100 later.

Thank you,

S

weixingzhang commented 3 years ago

@souptc may know why?

souptc commented 3 years ago

@RyanUnderhill , it seems the "./libonnxruntime_providers_shared.so" can't be resolved on this system, although the shared library already been copied to the python installation location. do you have any idea what should be the correct way to resolve the path?

souptc commented 3 years ago

btw, the message is for some experiment features, for your usage, i think it should still works fine. did you meet any issue when using onnxruntime?

mkserge commented 3 years ago

Testing with CPUExecutionProvider it does work, however I am seeing the following warnings when converting the (torch) models to ONNX:

Warning: Unsupported operator LayerNormalization. No schema registered for this operator.
Warning: Unsupported operator Gelu. No schema registered for this operator.
[There are many of these, trimming for brevity]

I am doing the conversion through

    transformers.convert_graph_to_onnx.convert(
        framework="pt",
        model=args.model_in,
        output=Path(output),
        opset=11,
        pipeline_name=args.pipeline
    )

and later optimize it through

    opt_options = BertOptimizationOptions('bert')
    opt_options.enable_embed_layer_norm = False
    opt_model = onnxruntime.transformers.optimizer.optimize_model(
        output,
        'bert',
        num_heads=16,
        hidden_size=1024,
        optimization_options=opt_options)
    opt_model.save_model_to_file(output)

Do you know what could be the reason for the above warning? (I just noticed that there is also onnxruntime.transformers.convert_to_onnx as well)

Will try on GPU soon.

snnn commented 3 years ago

@mkserge, would #7488 help?

mkserge commented 3 years ago

@mkserge, would #7488 help?

I can confirm that a build from the chenta/fix_runtime_path branch resolves the issue. Thank you for your quick response!

BTW, I am also running into some missing dependencies from the compiled wheel. The packages coloredlogs and sympy are missing. I can open a separate issue if you prefer.

>>> from onnxruntime.transformers import optimizer
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/miniconda/lib/python3.8/site-packages/onnxruntime/transformers/optimizer.py", line 21, in <module>
    import coloredlogs
ModuleNotFoundError: No module named 'coloredlogs'

Also, any hints regarding

Warning: Unsupported operator LayerNormalization. No schema registered for this operator.
Warning: Unsupported operator Gelu. No schema registered for this operator.

warnings during the conversion?

Thanks again, much appreciated!

oliviajain commented 3 years ago

You can install coloredlogs and sympy modules with pip. You can edit the dockerfile by changing the line: RUN pip install --upgrade pip numpy &&\ to: RUN pip install --upgrade pip numpy coloredlogs sympy &&\

mkserge commented 3 years ago

You can install coloredlogs and sympy modules with pip. You can edit the dockerfile by changing the line: RUN pip install --upgrade pip numpy &&\ to: RUN pip install --upgrade pip numpy coloredlogs sympy &&\

I know, of course 😄

But, wouldn't you expect that pip install onnxruntime will install its dependencies? Currently the only way to discover that these dependencies are missing is to crash out with an exception.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime

Failed to load library ./libonnxruntime_providers_shared.so #7485