no acceleration onnx on e5 2680v3

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.62k stars 2.92k forks source link

no acceleration onnx on e5 2680v3 #16185

Open xiaoguaishoubaobao opened 1 year ago

xiaoguaishoubaobao commented 1 year ago

onnx version :'1.14.0'

When I convert the weight file to .onnx (half=True) When using cpu for inference at that time Inference speed is 1.5 times faster than .pt on my own computer (i7 12700) Predicting 15 images .pt: 6.50s .onnx: 4.8s

But when I put the same weights on the e5 2680v4 server, the result is basically the same, even slower

.pt: 8.50s .onnx: 9.8s

What is going on here? Does onnx not support e3 and e5 cpu's?

hanbitmyths commented 1 year ago

@xiaoguaishoubaobao, it would help if you could specify at least a model architecture.

And ONNX Runtime 1.15.0 was released. Can you try it and see if you have the same result?

xiaoguaishoubaobao commented 1 year ago

Thank you very much for your reply onnx is 1.14.0 onnx runtime is 1.15.0 I think this may be better than the result of my server being virtualized as a result of KVM My server is a cloud server and I am not using a dedicated server

tianleiwu commented 1 year ago

For CPU, fp16 usually does not help performance (compared to fp32) since FP16 has no native support in most CPU so FP16 operators have to be casted back to FP32.

Try int8 quantization instead for CPU.