Open xiaoguaishoubaobao opened 1 year ago
@xiaoguaishoubaobao, it would help if you could specify at least a model architecture.
And ONNX Runtime 1.15.0 was released. Can you try it and see if you have the same result?
Thank you very much for your reply onnx is 1.14.0 onnx runtime is 1.15.0 I think this may be better than the result of my server being virtualized as a result of KVM My server is a cloud server and I am not using a dedicated server
For CPU, fp16 usually does not help performance (compared to fp32) since FP16 has no native support in most CPU so FP16 operators have to be casted back to FP32.
Try int8 quantization instead for CPU.
onnx version :'1.14.0'
When I convert the weight file to .onnx (half=True) When using cpu for inference at that time Inference speed is 1.5 times faster than .pt on my own computer (i7 12700) Predicting 15 images .pt: 6.50s .onnx: 4.8s
But when I put the same weights on the e5 2680v4 server, the result is basically the same, even slower
.pt: 8.50s .onnx: 9.8s
What is going on here? Does onnx not support e3 and e5 cpu's?