Why can't compressed models reduce processing time significantly? Let's take a glimpse.

Model	Size(MB)	Precision(IOU)	Recall(Ratio)	Processing Time(ms)
Yolo	195	0.74	0.93	2034
Tiny-Yolo	61	0.62	0.77	1160
Compressed Tiny-Yolo	25	0.52	0.58	837
Tiny-Darknet	3.6	0.54	0.63	954

Yolo and Tiny-Yolo are reference models and Compressed Tiny-Yolo (remove 12th and 13th layers) and Tiny-Darknet (based on SqueezeNet) are trained from scratch. One of my observations is (note that processing times are measured on a phone) although Tiny-Darknet can reduce the weight file size, it cannot translate into a significant reduction in processing (inference) time.

I am looking into other techniques such as quantization. Anyone has tried something similar before?

pjreddie / darknet

Why can't compressed models reduce processing time significantly? Let's take a glimpse. #448