Open seymurkafkas opened 1 month ago
I have also tried the script with following dependencies to bisect the issue. Torch: 2.2.1 TensorRT: 8.6.1 torch_tensorrt: 2.2.0 Python: 3.11 CUDA: 12.1
With these dependencies, it also works as expected (good results)
TS INT8 degradation later versions
Hi all, I get a degradation in results after an INT8 quantization with torchscript, after updating my torch_tensorrt, torch and tensorrt versions. I have listed the dependencies for both cases below, is this expected?
Earlier Version (Works Well): Torch: 2.0.1 CUDA: 11.8 torch_tensorrt: 1.4.0 Tensorrt: 8.5.3.1 GPU: A100 Python: 3.9
Later Version (Degradation in Results): Torch 2.4.0 CUDA 12.1 torch_tensorrt: 2.4.0 Tensorrt: 10.1.0 GPU: A100 Python: 3.11
Script (Approximately, as I can't submit the model):
Note: In the later version, need to switch
import torch_tensorrt.ptq
toimport torch_tensorrt.ts.ptq
, the rest of the script is identicalWhile the previous versions work well (I get a quantized model that produces close-enough results to the original model), for the later version, I get garbage outputs (I can see there is something wrong with the calibration as the output tensor values is always within a small range 0.18-0.21, whereas it should take any value between -1,1). I'm posting the quantization script approximately, however, I cannot post the model details unfortunately, as it's proprietary.
Would appreciate all forms of help :), also would love to submit a fix for the underlying issue (if one is present).