[Performance] Performance issue on Linux vs Windows for BERT model.

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

MIT License

14.09k stars 2.84k forks source link

Describe the issue

Hello,

I have been using onnxruntime on both Linux and Windows. What I observe is that when I run BERT model for inferencing, there is a huge performance difference. On a Linux machine performance is almost 80% more than a Windows machine. Here is the system configuration I am using.

Inference Type : CPU Inferencing. Processor Information : AMD Ryzen Threadripper PRO 3995 WX 64 cores. RAM Size : 512 GB

To reproduce

1) Build onnxruntime from source. 2) Install .whl file. 3) Run an inference test for BERT large uncased model.

Urgency

No response

Platform

Windows

OS Version

10.0.19044 Build 19044

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.10.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

Is this a quantized model?

conda create --name ort python=3.8 conda activate ort conda install pytorch torchvision torchaudio cpuonly -c pytorch pip install onnx onnxruntime coloredlogs packaging psutil py3nvml numpy transformers py-cpuinfo sympy protobuf==3.20.1 python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel --provider CPU -p fp32 -o by_script -i 3 -t 100 -b 1 -s 128 -n 8 16 32 64 128 python -m onnxruntime.transformers.machine_info

microsoft / onnxruntime