After TF Lite quantization the size of Yolov4 tiny model is reduced indeed. But the latency is increasing. For dynamic-range quantization up to 2-3 times. For int8 - up to 4-5 times. I tested it on desktop linux (x86-64) and Raspberry 3 (armv7). The result is same. Is it the problem that TF Lite optimizer doesn't support Yolov4 tiny layers?
After TF Lite quantization the size of Yolov4 tiny model is reduced indeed. But the latency is increasing. For dynamic-range quantization up to 2-3 times. For int8 - up to 4-5 times. I tested it on desktop linux (x86-64) and Raspberry 3 (armv7). The result is same. Is it the problem that TF Lite optimizer doesn't support Yolov4 tiny layers?