Open snippler opened 3 years ago
This fully depends on models. Could you share more details?
I tested MobileFaceNet as implemented by that repository: https://github.com/wujiyang/Face_Pytorch It is mostly a MobileNetv2.
Hi @snippler, I had similar problems while using ResNet50 Models. Did you figure out any reason behind it? @zhanghuanrong, any help is highly appreciated. Thank you!
No progress on my side @jay-karan. No idea what can be the reason.
Describe the bug I had a full precision onnxruntime session. Then I loaded my network and quantized it by
from onnxruntime.quantization import quantize, QuantizationMode quantized_model = quantize(model, quantization_mode=QuantizationMode.IntegerOps)
Original model needed 1s (0.03s) for inference on CPU while the quantized model needs 10s (0.3s).
What can be the reason and how to change?
I tested on ARM and Intel (the time in brackets above) processors.