latency result slower than tensorrt fp16

zkkli / I-ViT

[ICCV 2023] I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Apache License 2.0

148 stars 12 forks source link

latency result slower than tensorrt fp16 #2

Open zhanglei1172 opened 11 months ago

zhanglei1172 commented 11 months ago

Hi, I tried to replicate your speed experiment, I tested the deit_tiny, batch size=1, RTX3090 environment, after a few days of autotune, compared to tensorrt FP16, speed is still slower.

Here are the results of my experiment:

zkkli commented 11 months ago

Hi.

Our I-ViT TVM implementation is designed for the Turing Tensor Core (RTX 2080Ti), so there could be potential issues in the Ampere Tensor Core (RTX 3090) environment which could lead to sub-optimal optimizations.

We are also working on exploring solutions. And, if it's convenient, please let me know the version of TVM and TensorRT you're using.

zhanglei1172 commented 11 months ago

Thanks for your reply, my development environment is

TVM: 0.14.dev0 tensorrt: 8.6.1

ktadgh commented 2 months ago

Hello, I just wanted to add that I saw the same issue on an A6000 with tensorrt10. @zhanglei1172 can I ask when you tried to use it tensorrt int8 whether you enabled fp16 as a fallback? And @zkkli, can I ask in your experiments if the times were measuered using TensorRT? Thanks !