pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.58k stars 350 forks source link

🐛 [Bug] torch_tensorrt.compile does not work anymore with torchscript - v1.2.0 #1383

Closed SM1991CODES closed 1 year ago

SM1991CODES commented 2 years ago

Bug Description

Compiling torchscript modules to tensorrt was working fine in last release. Now the simple script below fails with the following error message.

I have tried with tracing, scripting as well as simple pytorch module, same error.

To Reproduce

` def export_pth_to_trt_fp32(path_trained_pth, path_save_ts_trt): """ Function exports the trained model and weighst to tensorrt usig FP32 precision """

# instantiate model
model = model_bevdetnet.BevDetNetSimple(in_channels=argo_settings.N_CHANNELS_TRAIN_BEV, 
                                        out_kp_channels=argo_settings.N_CHANNELS_PREDICTION_KP, 
                                        scale_H=2, scale_W=2, predict_3d_center=True).cuda().eval()
# load weights, set eval mode
model.load_state_dict(torch.load(path_trained_pth, map_location="cuda:0"))
# model.eval()

sample_input = torch.randn((2, 4, 384, 384)).float().cuda()
traced_model = torch.jit.trace(model, sample_input)
# scripted_model = torch.jit.script(model)
# torch.jit.save(traced_model, "temp_fp32.ts"

# compile to torchscript
# ts_model = torch.jit.load("temp_fp32.ts")
trt_ts_module = trt.compile(traced_model, inputs=[sample_input], enabled_precisions={torch.float32})
torch.jit.save(trt_ts_module, path_save_ts_trt)

`

Steps to reproduce the behavior:

1. 1. 1.

image

Expected behavior

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

Additional context

msaroufim commented 2 years ago

Getting the same error, I believe this should be a high priority

SM1991CODES commented 2 years ago

Dear Pytorch/ Nvidia community,

While my comments may be ill-informed, but as a regular user of Nvidia/ Pytorch frameworks, I have a small advice/ request.

Presently, to get Pytorch models running on Nvidia hardware suing Tensorrt, I see several ways, each with it's shortcomings. I have tried to outline them based on the observations in my use case (3D object detection):

1) Torch-Tensorrt: Good for python and C++ inference since it allows Libtorch usage. Unstable in general, fp16 accuracy is less than torch2trt library, PTQ quantization for INT8 accuracy is practically unusable, QAT INT8 using fake quantize (pytorch-quantization) has incomplete documentation and works only via onnx export, speed was found to be worse than fp16.

2) Pytorch - onnx -tensorrt: Multi-stage process Never tried fully Documentation is not great. May be Libtorch functions can still be used for data pre and post processing???

3) Triton inference server: Yet another way!! Never tried fully Needs running additional docker containers for the server, not generally suitable for

Now, rather than several paths, would it make sense to pick any one and make it really robust? I think the following aspects need to be addressed: 1) Easy interface with inputs and outputs in both Python and C/C++, typically 2D, 3D arrays - libtorch tensors are a great option

2) Easy and robust conversion of pytorch model into trt engine or similar that can be consumed by both Python and C++ end cases. Also, please improve the error logging to be more useful.

3) Fixing INT8, FP16 and possibly FP8 quantization with end-to-end working examples.

4) Updated and working documentation.

From what I have seen, one way could be use the onnx pipeline, integrate it with Libtorch usage for input and output and other tensor maths. Since onnx also has it's own quanitzation and optimization parts, this could be used.

I hope this helps make the framework better, user-friendly and robust.

ZiyueWangUoB commented 1 year ago

Any update on this?

SM1991CODES commented 1 year ago

I haven't seen any progress. For now I have stopped using Torch-Tensorrt and use Tensorrt C++ API. Nvidia also advised to used Tensorrt APIs directly as it's most stable flow.

ZiyueWangUoB commented 1 year ago

@SM1991CODES As in, you are using the torch-tensorrt package to directly convert into tensorrt, then using the C++ Api?

SM1991CODES commented 1 year ago

No, I do not use Torch-Tensorrt at all now. Pytorch -> onnx -> trtexec -> trt engine -> Tensorrt c++ API -> inference Additionally, you can use libtorch for pre and post processing stuff.

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days