pytorch / TensorRT

PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT
https://pytorch.org/TensorRT
BSD 3-Clause "New" or "Revised" License
2.58k stars 350 forks source link

🐛 [Bug] Torch-TRT QDQ nodes affect perf vs PTQ, native TRT they do not #1323

Closed ncomly-nvidia closed 1 year ago

ncomly-nvidia commented 2 years ago

Bug Description

When using the PyT-QAT toolkit, QAT perf is slower than PTQ, for TRT this is not the case.

Torch-TRT:

Model Accuracy Performance
Baseline MobileNetv2 75.56% 11.92ms
Base + TRT(TRT FP32) 75.59% 6.78ms
PTQ + TRT(TRT int8) 71.41% 1.57ms
QAT+TRT(TRT INT8) 74.00% 2.18ms

Native TRT:

Model Accuracy Performance
Baseline MobileNetv2 71.11% 11.92ms
Base + TRT
(TRT FP32)
71.13% 5.95ms
PTQ + TRT
(TRT int8)
68.11% 1.59ms
QAT+TRT
(TRT INT8)
70.31% 1.61ms

To Reproduce

Steps to reproduce the behavior:

  1. Torch-TRT notebook
  2. TRT notebook - reach out to @ncomly-nvidia

Expected behavior

QDQ affect on perf is the same between TRT & Torch-TRT

Environment

Build information about Torch-TensorRT can be found by turning on debug messages

DLFW 22.04: nvcr.io/nvidia/pytorch:22.04-py3

Additional context

github-actions[bot] commented 1 year ago

This issue has not seen activity for 90 days, Remove stale label or comment or this will be closed in 10 days