pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.5k stars 344 forks source link

🐛 [Bug] The version for CUDA 11 depends on CUDA 12 libraries #3102

Closed bryant1410 closed 2 weeks ago

bryant1410 commented 3 weeks ago

Bug Description

The version of this package for CUDA 11.8 depends on CUDA 12 libraries:

https://github.com/pytorch/TensorRT/blob/4aa6e7903188acaf6678d7d04afa34d1d2f037b8/pyproject.toml#L46-L47

This can cause issues on a system that doesn't support CUDA 12, because this or another library can pick up on them.

To Reproduce

You can download, for example, https://download.pytorch.org/whl/cu118/torch_tensorrt-2.4.0%2Bcu118-cp310-cp310-linux_x86_64.whl and inspect its dependencies.

Expected behavior

I guess the expected behavior would be to depend on the exact same dependencies but for CUDA 11.

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

bryant1410 commented 3 weeks ago

And, if I understand correctly, that version should also depend on tensorrt-cu11 and not on tensorrt.

lanluo-nvidia commented 3 weeks ago

@bryant1410
if you download tensorrt-10.1.0 wheel from pypi and you will see the requires.txt has tensorrt-cu12 regardless cuda version is cu121 or cu118. hence as long as we have tensorrt-10.1.0 as dependency it will always download tensorrt-cu12 libraries.

What I did in the PR: https://github.com/pytorch/TensorRT/pull/3105 is to make sure torch_tensorrt dependency in the pyproject.toml is correct. From the pipeline result, I can see this will download both tensorrt-cu11-10.1.0 and tensorrt-cu12-10.2.0 as dependency for cu118 version.

bryant1410 commented 3 weeks ago

shouldn't the CUDA 11 of this library (torch-tensorrt) depend on only tensorrt-cu11 (as opposed to also tensorrt)?

The problem with having both is that I have experienced my CUDA-11 PyTorch installation picking up on the incorrect CUDA library for some stuff, causing issues. My workaround so far is to skip installing Torch-TensorRT.

bryant1410 commented 3 weeks ago

TensorRT Python meta-package currently defaults to CUDA 12, as per the docs:

You can append -cu11 or -cu12 to any Python module if you require a different CUDA major version. When unspecified, the TensorRT Python meta-packages default to the CUDA 12.x variants, the latest CUDA version supported by TensorRT.

Source: https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html#installing-pip

This may not be great for future compatibility (though I don't have a strong opinion). Especially since they seem to change the default over time (e.g., see the release notes for TensorRT v10.0.1, where they changed the default to CUDA 12 for the first time if I understand correctly -- based on the previous release notes too).

It seems to me then that the best and official solution is to make the the CUDA 11 version depend only on tensorrt-cu11.

lanluo-nvidia commented 3 weeks ago

@bryant1410 thanks for your input: with this fix https://github.com/pytorch/TensorRT/pull/3105 I have verified it is now only installing cu11 related dependencies for cu118: Successfully installed nvidia-cuda-runtime-cu11-2022.4.25 nvidia-cuda-runtime-cu117-11.7.60 tensorrt-cu11-10.1.0 tensorrt-cu11-bindings-10.1.0 tensorrt-cu11-libs-10.1.0 torch-tensorrt-2.5.0.dev20240820+cu118

bryant1410 commented 3 weeks ago

Great!