Quantized model inference time is higher than non quantized model

BouchikhiYousra commented 1 year ago

Before Asking

[X] I have read the README carefully. 我已经仔细阅读了README上的操作指引。
[X] I want to train my custom dataset, and I have read the tutorials for training your custom data carefully and organize my dataset correctly; (FYI: We recommand you to apply the config files of xx_finetune.py.) 我想训练自定义数据集，我已经仔细阅读了训练自定义数据的教程，以及按照正确的目录结构存放数据集。（FYI: 我们推荐使用xx_finetune.py等配置文件训练自定义数据集。）
[X] I have pulled the latest code of main branch to run again and the problem still existed. 我已经拉取了主分支上最新的代码，重新运行之后，问题仍不能解决。

Search before asking

[X] I have searched the YOLOv6 issues and found no similar questions.

Question

First I downloaded the pretrained weights of yolov6n model then finetuned it on my custom dataset. For quantization and I followed the tutorial to reconstruct yolov6n with RepOptimizer using my custom data and specified pretrained weights of yolov6n in config file, then I followed the PTQ tutorial for quantizing this model. Now when I run the quantized model on a specific video its FPS is 2 times lower than the non quantized model ran on the same video. Do you have any Idea what may be causing this?

PS: my custom data is 3000 image instances from the category "people" of COCO dataset

Additional

No response

Chilicyy commented 1 year ago

@BouchikhiYousra Hi, did you run inference with the pytorch model checkpoint or tensorRT model?

BouchikhiYousra commented 1 year ago

I saved the model_ptq from partial_quant.py script in checkpoint format, then ran inference with it. I had to create another loading function for the quantized model using torch.load and kept the fusing function because load_checkpoint wasn’t working.

meituan / YOLOv6