meituan / YOLOv6

YOLOv6: a single-stage object detection framework dedicated to industrial applications.
GNU General Public License v3.0
5.72k stars 1.03k forks source link

Quantized model inference time is higher than non quantized model #963

Open BouchikhiYousra opened 1 year ago

BouchikhiYousra commented 1 year ago

Before Asking

Search before asking

Question

First I downloaded the pretrained weights of yolov6n model then finetuned it on my custom dataset. For quantization and I followed the tutorial to reconstruct yolov6n with RepOptimizer using my custom data and specified pretrained weights of yolov6n in config file, then I followed the PTQ tutorial for quantizing this model. Now when I run the quantized model on a specific video its FPS is 2 times lower than the non quantized model ran on the same video. Do you have any Idea what may be causing this?

PS: my custom data is 3000 image instances from the category "people" of COCO dataset

Additional

No response

Chilicyy commented 1 year ago

@BouchikhiYousra Hi, did you run inference with the pytorch model checkpoint or tensorRT model?

BouchikhiYousra commented 1 year ago

I saved the model_ptq from partial_quant.py script in checkpoint format, then ran inference with it. I had to create another loading function for the quantized model using torch.load and kept the fusing function because load_checkpoint wasn’t working.