Open lebionick opened 3 months ago
Hello @lebionick In the example you've provided, it looks like you're using a torchscript model but you're using our dynamo
backend. We currently do not support INT8 in dynamo and plan to work this feature in the coming weeks. The workflow would be very similar to the FP8 workflow you've tried but it involves more changes than dtypes.
I would like to quantize my model to INT8 precision and then compile it using torch_tensorrt. Unfortunately, it is transformer based vision model and default way to do it - does not work.
It gives output:
I also tried passing plain torch model to compile method, and it gives output:
After this, I've found torch_tensorrt.dynamo API, but it seems there is no INT8 support. I took fresh https://github.com/pytorch/TensorRT/blob/main/examples/dynamo/vgg16_fp8_ptq.py example, and tried to change FP8 to INT8 (because my 2080ti does not support it according to error) and torch.export breaks:
I would like to see recommended complete example of quantizing and converting model to INT8 precision with torch_tensorrt. I am using: CUDA 12.1 tensorrt==10.0.1 tensorrt-cu12==10.0.1 tensorrt-cu12-bindings==10.0.1 tensorrt-cu12-libs==10.0.1 torch_tensorrt==2.3.0 torch==2.3 nvidia-modelopt==0.13.0