❓ undefined reference when Building Torch-TensorRT

nicholasguimaraes commented 10 months ago

❓ Question

What you have already tried

I'm trying to build Torch-TensorRT version 2.3.0a0. I successfully built Torch 2.3.0.dev.

When building Torch-TensorRT, if I comment http_archive for libtorch and libtorch_pre_cxx11_abi and use the new_local_repository for both of them I get an undefined reference error when running sudo PYTHONPATH=$PYTHONPATH python3 setup.py install

Now If I leave http_archive for libtorch and libtorch_pre_cxx11_abi as default I can "successfully" build Torch-TensorRT but when trying to import it to any python code I get:

ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

In the pyproject.toml file I can see that Torch.2.3.0 is mandatory for building Torch-TensorRT and that is the version of torch installed and running in my environment.

Not sure on how to proceed since it seems I have all the required packages installed.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

PyTorch Version (e.g., 1.0): 2.3.0a0+git4aa1f99
OS (e.g., Linux): Ubuntu 20.04
How you installed PyTorch (conda, pip, libtorch, source): source
Build command you used (if compiling from source): sudo python3 setup.py build develop
Are you using local sources or building from archives: local
Python version: 3.8
CUDA version: 12.1
GPU models and configuration: 2080 ti

Additional context

nicholasguimaraes commented 10 months ago

I am trying to understand if any of the required lib/api versions is incorrect.

Pytorch was compiled from the main repo with version 2.3.0a0+git4aa1f99

Cuda is on version 12.1

TensorRT is on version 8.6.1.6

Libtorch when using http archive is downloaded for cuda version 12.1 which is the exact match to the cuda installed in my system! But gives the undefined reference when I import torch_tensorrt.

On the other hand if I choose to BUILD LibTorch using new local repository pointing at the local where Torch 2.3.0 dev is installed I cannot finish the torch-tensorrt compilation because of undefined references.

What specific version of torch and libtorch must be used?

narendasan commented 10 months ago

Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly

nicholasguimaraes commented 9 months ago

Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly

Yes I did, when trying to build Torch-TensorRT using my own compiled 2.3.0.dev torch I edited the WORKSPACE file like this:

new_local_repository(
    name = "libtorch",
    path = "/home/nick/Documents/pytorch/torch",
    build_file = "third_party/libtorch/BUILD"
)

new_local_repository(
    name = "libtorch_pre_cxx11_abi",
    path = "/home/nick/Documents/pytorch/torch",
    build_file = "third_party/libtorch/BUILD"
)

Torch dev 2.3.0 is compiled and running like a charm but when I try building Torch-TensorRT I get undefined reference from libtorchtrt.so

Similarly if I comment new_local_repository for libtorch and libtorch_pre_cxx11_abi and use the http_archive I can build Torch-TensorRT but when importing it to a python script I get ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs

This is how I built Torch dev:

git clone https://github.com/pytorch/pytorch.git  

sudo python3 setup.py build develop

narendasan commented 9 months ago

Oh, if you built pytorch locally you might need to add --use-cxx11-abi as a flag to setup.py since by default pytorch releases use the old abi but source builds use the new one

Something like sudo python3 setup.py develop --use-cxx11-abi should work

nicholasguimaraes commented 9 months ago

The error persists.

I tried installing pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu121

Which installs torch 2.3.0.dev20240219+cu121 and tried all possible WORKSPACE configuration with libtorch, libtorch_pre_cxx11_abi, cudnn,
tensorrt.

Torch works fine and torch.cuda.is_available() returns True.

Using pip installed torch dev even if compilation is successful I get the undefined Symbol error when importing torch_tensorrt: ImportError: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/build/lib.linux-x86_64-cpython-38/torch_tensorrt/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit8toIValueEN8pybind116handleERKN3c104Type24SingletonOrSharedTypePtrIS4_EENS3_8optionalIiEE

My latest attempt was pointing both libtorch and libtorch_pre_cxx11_abi to the pip installed torch.dev.2.3.0 package but during compilation I got:

ERROR: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/cpp/bin/torchtrtc/BUILD:12:10: Linking cpp/bin/torchtrtc/torchtrtc failed: (Exit 1): gcc failed: error executing command (from target //cpp/bin/torchtrtc:torchtrtc) /usr/bin/gcc @bazel-out/k8-opt/bin/cpp/bin/torchtrtc/torchtrtc-2.params

It seems that regardless of torch dev 2.3.0 being installed via pip or compiled and regardless of using --use-cxx11-abi flag I cannot either compile or import torch_tensorrt.

I remind everyone that I'm on ubuntu 18 , nvidia driver version 535.54.03 and cuda tool kit 12.1

narendasan commented 9 months ago

What build command are you using for torch-tensorrt?

matthost commented 4 months ago

You solve this? Running into this error. Think I have everything setup for Torch-2.3.0, TensorRT 10.0.1, torch-tensorrt 2.3.0, all compiled with cuda 11.8. Also on Python 3.8.

Though everywhere else I've seen this error implies this is due to mismatched deps.

matthost commented 4 months ago

I do use a source build of torch rather than a distribution, while trying to use a distribution for tensorrt, so wonder if it's the use-cxx11-abi thing...

matthost commented 4 months ago

I'm going to try moving to all prebuilt distros which seem to be working

woshizouguo commented 1 month ago

@matthost can you help to clarify how to fix this issue? i am using torch 2.4.1, it has the same error.

matthost commented 1 month ago

Either build every torch related package from source or use all prebuilt wheels from their download page

avdhoeke commented 1 week ago

I used the following installation guide under Jetpack: 6.1 with:

CUDA: 12.6.68
cuDNN: 9.3.0.75
TensorRT: 10.3.0.30
Python 3.10.12
torch-2.5.0
torchvision-0.20

Both torch and torchvision were fetched from here. Works like a charm. Just make sure to build TensorRT from the correct branch (in this case release/2.5).

pytorch / TensorRT