pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.5k stars 344 forks source link

docker build failed #851

Closed Biaocsu closed 2 years ago

Biaocsu commented 2 years ago
git clone https://github.com/NVIDIA/Torch-TensorRT
cd Torch-TensorRT

docker build --build-arg BASE=21.11 -f docker/Dockerfile -t torch_tensorrt:latest .

gets the error like this:
Sending build context to Docker daemon  29.61MB
Step 1/33 : ARG BASE=21.10
Step 2/33 : ARG BASE_IMG=nvcr.io/nvidia/pytorch:${BASE}-py3
Step 3/33 : FROM ${BASE_IMG} as base
 ---> 6eae00e8ee65
Step 4/33 : FROM base as torch-tensorrt-builder-base
 ---> 6eae00e8ee65
Step 5/33 : RUN rm -rf /opt/torch-tensorrt /usr/bin/bazel
 ---> Using cache
 ---> 407b606a69ba
Step 6/33 : ARG ARCH="x86_64"
 ---> Using cache
 ---> a47c16d2137b
Step 7/33 : ARG TARGETARCH="amd64"
 ---> Using cache
 ---> 2aa5a3eab761
Step 8/33 : ARG BAZEL_VERSION=4.2.1
 ---> Using cache
 ---> f21f368cf46b
Step 9/33 : RUN git config --global url."https://github.com.cnpmjs.org/".insteadOf https://github.com/
 ---> Using cache
 ---> 8b689f617bb2
Step 10/33 : RUN [[ "$TARGETARCH" == "amd64" ]] && ARCH="x86_64" || ARCH="${TARGETARCH}"  && wget -q https://github.com/bazelbuild/bazel/releases/download/${BAZEL_VERSION}/bazel-${BAZEL_VERSION}-linux-${ARCH} -O /usr/bin/bazel  && chmod a+x /usr/bin/bazel
 ---> Using cache
 ---> a3c8f7522040
Step 11/33 : RUN touch /usr/lib/$HOSTTYPE-linux-gnu/libnvinfer_static.a
 ---> Using cache
 ---> d21a2d4dff51
Step 12/33 : RUN rm -rf /usr/local/cuda/lib* /usr/local/cuda/include   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/lib /usr/local/cuda/lib64   && ln -sf /usr/local/cuda/targets/$HOSTTYPE-linux/include /usr/local/cuda/include
 ---> Using cache
 ---> 39ee2cf4915f
Step 13/33 : RUN apt-get update && apt-get install -y --no-install-recommends locales ninja-build && rm -rf /var/lib/apt/lists/* && locale-gen en_US.UTF-8
 ---> Using cache
 ---> 711e012e97fd
Step 14/33 : FROM torch-tensorrt-builder-base as torch-tensorrt-builder
 ---> 711e012e97fd
Step 15/33 : COPY . /workspace/torch_tensorrt/src
 ---> Using cache
 ---> 2ea5a90787b7
Step 16/33 : WORKDIR /workspace/torch_tensorrt/src
 ---> Using cache
 ---> b8e79eb37534
Step 17/33 : RUN cp ./docker/WORKSPACE.docker WORKSPACE
 ---> Using cache
 ---> 7a90e4a378d4
Step 18/33 : RUN ./docker/dist-build.sh
 ---> Running in 669eeb348f7c
running bdist_wheel
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
Loading:
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded
Analyzing: target //:libtorchtrt (1 packages loaded, 0 targets configured)
INFO: Analyzed target //:libtorchtrt (43 packages loaded, 2965 targets configured).
INFO: Found 1 target...
[0 / 10] [Prepa] Creating source manifest for @rules_pkg//:build_tar
[1,111 / 1,235] Compiling core/lowering/passes/remove_bn_dim_check.cpp; 3s processwrapper-sandbox ... (3 actions running)
[1,112 / 1,235] Compiling core/lowering/passes/remove_bn_dim_check.cpp; 7s processwrapper-sandbox ... (4 actions, 3 running)
[1,115 / 1,235] Compiling core/lowering/passes/linear_to_addmm.cpp; 8s processwrapper-sandbox ... (4 actions running)
[1,118 / 1,235] Compiling core/lowering/passes/exception_elimination.cpp; 6s processwrapper-sandbox ... (4 actions running)
[1,121 / 1,235] Compiling core/conversion/converters/impl/squeeze.cpp; 10s processwrapper-sandbox ... (4 actions running)
[1,122 / 1,235] Compiling core/conversion/converters/impl/interpolate.cpp; 13s processwrapper-sandbox ... (4 actions running)
[1,125 / 1,235] Compiling core/conversion/converters/impl/lstm_cell.cpp; 11s processwrapper-sandbox ... (4 actions, 3 running)
[1,129 / 1,235] Compiling cpp/bin/torchtrtc/main.cpp; 8s processwrapper-sandbox ... (4 actions, 3 running)
[1,133 / 1,235] Compiling cpp/bin/torchtrtc/main.cpp; 21s processwrapper-sandbox ... (4 actions, 3 running)
[1,142 / 1,235] Compiling core/conversion/converters/Weights.cpp; 7s processwrapper-sandbox ... (4 actions, 3 running)
[1,147 / 1,235] Compiling core/conversion/converters/impl/topk.cpp; 12s processwrapper-sandbox ... (4 actions, 3 running)
[1,155 / 1,235] Compiling core/conversion/converters/impl/cast.cpp; 16s processwrapper-sandbox ... (4 actions, 3 running)
[1,163 / 1,235] Compiling core/conversion/converters/impl/layer_norm.cpp; 15s processwrapper-sandbox ... (4 actions, 3 running)
[1,176 / 1,235] Compiling cpp/src/ptq.cpp; 8s processwrapper-sandbox ... (4 actions, 3 running)
[1,187 / 1,235] Compiling core/conversion/evaluators/aten.cpp; 17s processwrapper-sandbox ... (4 actions running)
ERROR: /workspace/torch_tensorrt/src/core/conversion/evaluators/BUILD:10:11: Compiling core/conversion/evaluators/eval_util.cpp failed: (Exit 1): gcc failed: error executing command /usr/bin/gcc -U_FORTIFY_SOURCE -fstack-protector -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 '-D_FORTIFY_SOURCE=1' -DNDEBUG -ffunction-sections ... (remaining 60 argument(s) skipped)

Use --sandbox_debug to see verbose messages from the sandbox
core/conversion/evaluators/eval_util.cpp: In function 'at::Tensor torch_tensorrt::core::conversion::evaluators::createTensorFromList(const c10::IValue&, const c10::IValue&, const c10::IValue&)':
core/conversion/evaluators/eval_util.cpp:241:67: error: invalid initialization of reference of type 'const c10::Type&' from expression of type 'std::shared_ptr<c10::Type>'
  241 |   at::ScalarType initial_scalar_type = c10::scalarTypeFromJitType(elem_type);
      |                                                                   ^~~~~~~~~
In file included from bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/List_inl.h:362,
                 from bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/List.h:480,
                 from core/conversion/evaluators/eval_util.cpp:2:
bazel-out/k8-opt/bin/external/libtorch/_virtual_includes/ATen/ATen/core/jit_type.h:1640:57: note: in passing argument 1 of 'c10::ScalarType c10::scalarTypeFromJitType(const c10::Type&)'
 1640 | inline at::ScalarType scalarTypeFromJitType(const Type& type) {
      |                                             ~~~~~~~~~~~~^~~~
Target //:libtorchtrt failed to build
Use --verbose_failures to see the command lines of failed build steps.
INFO: Elapsed time: 255.415s, Critical Path: 26.60s
INFO: 1195 processes: 1122 internal, 73 processwrapper-sandbox.
FAILED: Build did NOT complete successfully
FAILED: Build did NOT complete successfully
using CXX11 ABI build
building libtorchtrt
The command '/bin/sh -c ./docker/dist-build.sh' returned a non-zero code: 1

so what should I do or where did I do wrong?

narendasan commented 2 years ago

Seems like even though you tried to specify 21.11 base it is using 21.10. This api changed around then which is causing the failure. Are you trying to build master or a tagged release?

Biaocsu commented 2 years ago

Seems like even though you tried to specify 21.11 base it is using 21.10. This api changed around then which is causing the failure. Are you trying to build master or a tagged release?

build on master. And I changed the Dockerfile named BASE=21.11, still gets the same error

narendasan commented 2 years ago

@peri044 What is the pytorch version in the 21.11 container? I think it might have switched to 1.11 at that point and we dont have the 1.11 changes in master yet

github-actions[bot] commented 2 years ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days