🐛 [Bug] Torch-TRT QDQ nodes affect perf vs PTQ, native TRT they do not - Githubissues

pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

https://pytorch.org/TensorRT

BSD 3-Clause "New" or "Revised" License

2.58k stars 350 forks source link

🐛 [Bug] Torch-TRT QDQ nodes affect perf vs PTQ, native TRT they do not #1323

Closed ncomly-nvidia closed 1 year ago

ncomly-nvidia commented 2 years ago

Bug Description

When using the PyT-QAT toolkit, QAT perf is slower than PTQ, for TRT this is not the case.

Torch-TRT:

Model	Accuracy	Performance
Baseline MobileNetv2	75.56%	11.92ms
Base + TRT(TRT FP32)	75.59%	6.78ms
PTQ + TRT(TRT int8)	71.41%	1.57ms
QAT+TRT(TRT INT8)	74.00%	2.18ms

Native TRT:

Model	Accuracy	Performance
Baseline MobileNetv2	71.11%	11.92ms
Base + TRT (TRT FP32)	71.13%	5.95ms
PTQ + TRT (TRT int8)	68.11%	1.59ms
QAT+TRT (TRT INT8)	70.31%	1.61ms

To Reproduce

Steps to reproduce the behavior:

Torch-TRT notebook
TRT notebook - reach out to @ncomly-nvidia

Expected behavior

QDQ affect on perf is the same between TRT & Torch-TRT

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

DLFW 22.04: nvcr.io/nvidia/pytorch:22.04-py3

Additional context

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days