My observations were:
PyTorch cpu Inference time = 132.92 ms
OnnxRuntime cpu Inference time = 115.03 ms
The difference of 15 ms does not seem pretty impressive as compared to the efforts needed to migrate to onnx.
Please suggest something or am I missing some other configuration.
System information
OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu - CPU
Describe the bug I ran the official tutorial code for onnx
[(https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb)]
Didn't change any of the code and ran it as it is
My observations were: PyTorch cpu Inference time = 132.92 ms OnnxRuntime cpu Inference time = 115.03 ms
The difference of 15 ms does not seem pretty impressive as compared to the efforts needed to migrate to onnx. Please suggest something or am I missing some other configuration.
System information
To Reproduce Run [(https://github.com/microsoft/onnxruntime/blob/master/onnxruntime/python/tools/transformers/notebooks/PyTorch_Bert-Squad_OnnxRuntime_CPU.ipynb)]
Expected behavior 17x faster performance