microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.66k stars 2.93k forks source link

Dockerfile does not work #20458

Open PredyDaddy opened 6 months ago

PredyDaddy commented 6 months ago

Describe the issue

I try to build onnxruntime with tensorrt provider using onnxruntime-main/tools/ci_build/github/linux/docker/Dockerfile.ubuntu_cuda11_8_tensorrt8_6 and onnxruntime-main/dockerfile/Dockerfile.tensorrt

all get same result


 > [final 3/7] RUN git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime &&    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh:
0.371 Cloning into 'onnxruntime'...
194.3 error: RPC failed; curl 56 GnuTLS recv error (-54): Error in the pull function.
194.3 fatal: the remote end hung up unexpectedly
194.3 fatal: early EOF
194.3 fatal: index-pack failed
------
Dockerfile.ubuntu_cuda11_8_tensorrt8_6:74
--------------------
  73 |     # Clone ORT repository with branch
  74 | >>> RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\
  75 | >>>     /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh
  76 |     
--------------------
ERROR: failed to solve: process "/bin/sh -c git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&    /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh" did not complete successfully: exit code: 128

then I entry the container install having follow trouble

make: *** [Makefile:146: all] Error 2
Traceback (most recent call last):
  File "/app/install/onnxruntime/tools/ci_build/build.py", line 2962, in <module>
    sys.exit(main())
  File "/app/install/onnxruntime/tools/ci_build/build.py", line 2854, in main
    build_targets(args, cmake_path, build_dir, configs, num_parallel_jobs, args.target)
  File "/app/install/onnxruntime/tools/ci_build/build.py", line 1743, in build_targets
    run_subprocess(cmd_args, env=env)
  File "/app/install/onnxruntime/tools/ci_build/build.py", line 861, in run_subprocess
    return run(*args, cwd=cwd, capture_stdout=capture_stdout, shell=shell, env=my_env)
  File "/app/install/onnxruntime/tools/python/util/run.py", line 49, in run
    completed_process = subprocess.run(
  File "/usr/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/opt/cmake-3.27.3/bin/cmake', '--build', '/app/install/onnxruntime/build/Linux/Release', '--config', 'Release', '--', '-j40']' returned non-zero exit status 2.

Urgency

No response

Target platform

1080Ti

Build script

--------------------------------------------------------------

Copyright (c) Microsoft Corporation. All rights reserved.

Licensed under the MIT License.

--------------------------------------------------------------

Dockerfile to run ONNXRuntime with TensorRT integration

Build base image with required system packages

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 AS base

The local directory into which to build and install CMAKE

ARG ONNXRUNTIME_LOCAL_CODE_DIR=/code

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/src/tensorrt/bin:${ONNXRUNTIME_LOCAL_CODE_DIR}/cmake-3.27.3-linux-x86_64/bin:/opt/miniconda/bin:${PATH} ENV DEBIAN_FRONTEND=noninteractive

RUN apt-get update &&\ apt-get install -y sudo git bash unattended-upgrades wget RUN unattended-upgrade

Install python3

RUN apt-get install -y --no-install-recommends \ python3 \ python3-pip \ python3-dev \ python3-wheel &&\ cd /usr/local/bin &&\ ln -s /usr/bin/python3 python &&\ ln -s /usr/bin/pip3 pip;

RUN pip install --upgrade pip RUN pip install setuptools>=68.2.2

Install TensorRT

RUN v="8.6.1.6-1+cuda11.8" &&\ apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub &&\ apt-get update &&\ sudo apt-get install -y libnvinfer8=${v} libnvonnxparsers8=${v} libnvparsers8=${v} libnvinfer-plugin8=${v} libnvinfer-lean8=${v} libnvinfer-vc-plugin8=${v} libnvinfer-dispatch8=${v}\ libnvinfer-headers-dev=${v} libnvinfer-headers-plugin-dev=${v} libnvinfer-dev=${v} libnvonnxparsers-dev=${v} libnvparsers-dev=${v} libnvinfer-plugin-dev=${v} libnvinfer-lean-dev=${v} libnvinfer-vc-plugin-dev=${v} libnvinfer-dispatch-dev=${v}\ python3-libnvinfer=${v} libnvinfer-samples=${v} tensorrt-dev=${v} tensorrt-libs=${v}

Compile trtexec

RUN cd /usr/src/tensorrt/samples/trtexec && make

Install Valgrind

RUN apt-get install -y valgrind

Build final image from base. Builds ORT.

FROM base as final ARG BUILD_USER=onnxruntimedev ARG BUILD_UID=1000 RUN adduser --gecos 'onnxruntime Build User' --disabled-password $BUILD_USER --uid $BUILD_UID USER $BUILD_USER

ONNX Runtime arguments

URL to the github repo from which to clone ORT.

ARG ONNXRUNTIME_REPO=https://github.com/Microsoft/onnxruntime

The local directory into which to clone ORT.

ARG ONNXRUNTIME_LOCAL_CODE_DIR=/code

The git branch of ORT to checkout and build.

ARG ONNXRUNTIME_BRANCH=main

Optional. The specific commit to pull and build from. If not set, the latest commit is used.

ARG ONNXRUNTIME_COMMIT_ID

The supported CUDA architecture

ARG CMAKE_CUDA_ARCHITECTURES=75

WORKDIR ${ONNXRUNTIME_LOCAL_CODE_DIR}

Clone ORT repository with branch

RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\ /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh

WORKDIR ${ONNXRUNTIME_LOCAL_CODE_DIR}/onnxruntime

Reset to a specific commit if specified by build args.

RUN if [ -z "$ONNXRUNTIME_COMMIT_ID" ] ; then echo "Building branch ${ONNXRUNTIME_BRANCH}" ;\ else echo "Building branch ${ONNXRUNTIME_BRANCH} @ commit ${ONNXRUNTIME_COMMIT_ID}" &&\ git reset --hard ${ONNXRUNTIME_COMMIT_ID} && git submodule update --recursive ; fi

Build ORT

ENV CUDA_MODULE_LOADING "LAZY" ARG PARSER_CONFIG="" RUN /bin/sh build.sh ${PARSER_CONFIG} --parallel --build_shared_lib --cuda_home /usr/local/cuda --cudnn_home /usr/lib/x86_64-linux-gnu/ --use_tensorrt --tensorrt_home /usr/lib/x86_64-linux-gnu/ --config Release --build_wheel --skip_tests --skip_submodule_sync --cmake_extra_defines '"CMAKE_CUDA_ARCHITECTURES='${CMAKE_CUDA_ARCHITECTURES}'"'

Switch to root to continue following steps of CI

USER root

Intall ORT wheel

RUN pip install ${ONNXRUNTIME_LOCAL_CODE_DIR}/onnxruntime/build/Linux/Release/dist/*.whl

Error / output


[final 3/7] RUN git clone --single-branch --branch main --recursive https://github.com/Microsoft/onnxruntime onnxruntime && /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh:
0.377 Cloning into 'onnxruntime'...
77.96 fatal: the remote end hung up unexpectedly 77.96 fatal: early EOF 77.96 fatal: index-pack failed

Dockerfile.ubuntu_cuda11_8_tensorrt8_5:74

73 | # Clone ORT repository with branch 74 | >>> RUN git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime &&\ 75 | >>> /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh 76 |

ERROR: failed to solve: process "/bin/sh -c git clone --single-branch --branch ${ONNXRUNTIME_BRANCH} --recursive ${ONNXRUNTIME_REPO} onnxruntime && /bin/sh onnxruntime/dockerfiles/scripts/install_common_deps.sh" did not complete successfully: exit code: 128

Visual Studio Version

No response

GCC / Compiler Version

No response

jywu-msft commented 6 months ago

are you still seeing the error? the failure message from git clone reads like a server side transient issue.

PredyDaddy commented 6 months ago

are you still seeing the error? the failure message from git clone reads like a server side transient issue.

still have the problem, then I use CUDAExecutionProvider.