pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.58k stars 350 forks source link

🐛 [Bug] TS test_compiled_modules failed #3081

Closed zewenli98 closed 2 months ago

zewenli98 commented 2 months ago

Bug Description

In TRT 10.3 upgrade, TorchScript tests/cpp/test_compiled_modules.cpp failed due to the error:

[ FAILED ] CompiledModuleForwardIsCloseSuite/CppAPITests.CompiledModuleIsClose/3, where GetParam() = ("tests/modules/bert_base_uncased_traced.jit.pt", { { 1, 14 }, { 1, 14 } }, { Int, Int }) unknown file: Failure C++ exception with description "[Error thrown at core/util/trt_util.cpp:165] pos >= (-d.nbDims - 1) && pos <= d.nbDims ASSERT FAILED at core/util/trt_util.cpp:165, consider filing a bug: https://www.github.com/NVIDIA/Torch-TensorRT/issues ERROR: Index to unsqueeze is out of bounds. Expected value in range [0, -1], but got 0 " thrown in the test body.

To Reproduce

bazel test //tests/cpp:test_compiled_modules --compilation_mode=dbg --test_output=all --test_timeout=8000 --jobs 4 --config=pre_cxx11_abi --cxxopt='-std=c++17'
narendasan commented 2 months ago

Fixed in main

zewenli98 commented 2 months ago

@narendasan The bert error was fixed but vit error pops up:

error loading the model
tests/cpp/cpp_api_test.h:21: Failure
Value of: false
  Actual: false
Expected: true

[  FAILED  ] CompiledModuleForwardIsCloseSuite/CppAPITests.CompiledModuleIsClose/3, where GetParam() = ("tests/modules/vit_scripted.jit.pt", { { 1, 3, 224, 224 } }, { Float }) (0 ms)

I guess it's due to this change: https://github.com/pytorch/TensorRT/commit/9bbba4fec4882b226ca7a349416b8417e9917758#diff-486a448835bdc93716a54a08b4aaa012157fa1529caa329f5b2af4c649f57aebR54-R57

zewenli98 commented 2 months ago

downloaded vit thru hub.py and then this test passed.