Open nicholasguimaraes opened 10 months ago
I am trying to understand if any of the required lib/api versions is incorrect.
Pytorch was compiled from the main repo with version 2.3.0a0+git4aa1f99
Cuda is on version 12.1
TensorRT is on version 8.6.1.6
Libtorch when using http archive is downloaded for cuda version 12.1 which is the exact match to the cuda installed in my system! But gives the undefined reference when I import torch_tensorrt.
On the other hand if I choose to BUILD LibTorch using new local repository pointing at the local where Torch 2.3.0 dev is installed I cannot finish the torch-tensorrt compilation because of undefined references.
What specific version of torch and libtorch must be used?
Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly
Did you edit the WORKSPACE file to use your custom pytorch version? otherwise it will pull latest nightly
Yes I did, when trying to build Torch-TensorRT using my own compiled 2.3.0.dev torch I edited the WORKSPACE file like this:
new_local_repository(
name = "libtorch",
path = "/home/nick/Documents/pytorch/torch",
build_file = "third_party/libtorch/BUILD"
)
new_local_repository(
name = "libtorch_pre_cxx11_abi",
path = "/home/nick/Documents/pytorch/torch",
build_file = "third_party/libtorch/BUILD"
)
Torch dev 2.3.0 is compiled and running like a charm but when I try building Torch-TensorRT I get undefined reference from libtorchtrt.so
Similarly if I comment new_local_repository for libtorch and libtorch_pre_cxx11_abi and use the http_archive I can build Torch-TensorRT but when importing it to a python script I get ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
This is how I built Torch dev:
git clone https://github.com/pytorch/pytorch.git
sudo python3 setup.py build develop
Oh, if you built pytorch locally you might need to add --use-cxx11-abi
as a flag to setup.py since by default pytorch releases use the old abi but source builds use the new one
Something like sudo python3 setup.py develop --use-cxx11-abi
should work
The error persists.
I tried installing pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cu121
Which installs torch 2.3.0.dev20240219+cu121 and tried all possible WORKSPACE configuration with
libtorch,
libtorch_pre_cxx11_abi,
cudnn,
tensorrt.
Torch works fine and torch.cuda.is_available() returns True.
Using pip installed torch dev even if compilation is successful I get the undefined Symbol error when importing torch_tensorrt:
ImportError: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/build/lib.linux-x86_64-cpython-38/torch_tensorrt/_C.cpython-38-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit8toIValueEN8pybind116handleERKN3c104Type24SingletonOrSharedTypePtrIS4_EENS3_8optionalIiEE
My latest attempt was pointing both libtorch and libtorch_pre_cxx11_abi to the pip installed torch.dev.2.3.0 package but during compilation I got:
ERROR: /home/nick/Documents/tracker_mvit2_s3d/torch-tensorrt/TensorRT/cpp/bin/torchtrtc/BUILD:12:10: Linking cpp/bin/torchtrtc/torchtrtc failed: (Exit 1): gcc failed: error executing command (from target //cpp/bin/torchtrtc:torchtrtc) /usr/bin/gcc @bazel-out/k8-opt/bin/cpp/bin/torchtrtc/torchtrtc-2.params
It seems that regardless of torch dev 2.3.0 being installed via pip or compiled and regardless of using --use-cxx11-abi flag I cannot either compile or import torch_tensorrt.
I remind everyone that I'm on ubuntu 18 , nvidia driver version 535.54.03 and cuda tool kit 12.1
What build command are you using for torch-tensorrt
?
You solve this? Running into this error. Think I have everything setup for Torch-2.3.0, TensorRT 10.0.1, torch-tensorrt 2.3.0, all compiled with cuda 11.8. Also on Python 3.8.
Though everywhere else I've seen this error implies this is due to mismatched deps.
I do use a source build of torch rather than a distribution, while trying to use a distribution for tensorrt, so wonder if it's the use-cxx11-abi
thing...
I'm going to try moving to all prebuilt distros which seem to be working
@matthost can you help to clarify how to fix this issue? i am using torch 2.4.1, it has the same error.
Either build every torch related package from source or use all prebuilt wheels from their download page
I used the following installation guide under Jetpack: 6.1 with:
Both torch and torchvision were fetched from here. Works like a charm. Just make sure to build TensorRT from the correct branch (in this case release/2.5).
❓ Question
What you have already tried
I'm trying to build Torch-TensorRT version 2.3.0a0. I successfully built Torch 2.3.0.dev.
When building Torch-TensorRT, if I comment http_archive for libtorch and libtorch_pre_cxx11_abi and use the new_local_repository for both of them I get an undefined reference error when running sudo PYTHONPATH=$PYTHONPATH python3 setup.py install
Now If I leave http_archive for libtorch and libtorch_pre_cxx11_abi as default I can "successfully" build Torch-TensorRT but when trying to import it to any python code I get:
ImportError: /home/nick/.local/lib/python3.8/site-packages/torch_tensorrt/lib/libtorchtrt.so: undefined symbol: _ZN3c106detail23torchInternalAssertFailEPKcS2_jS2_RKSs
In the pyproject.toml file I can see that Torch.2.3.0 is mandatory for building Torch-TensorRT and that is the version of torch installed and running in my environment.
Not sure on how to proceed since it seems I have all the required packages installed.
Environment
conda
,pip
,libtorch
, source): sourceAdditional context