microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.09k stars 2.84k forks source link

[Performance] Performance issue on Linux vs Windows for BERT model. #13224

Open GAURAVSH19 opened 1 year ago

GAURAVSH19 commented 1 year ago

Describe the issue

Hello,

I have been using onnxruntime on both Linux and Windows. What I observe is that when I run BERT model for inferencing, there is a huge performance difference. On a Linux machine performance is almost 80% more than a Windows machine. Here is the system configuration I am using.

Inference Type : CPU Inferencing. Processor Information : AMD Ryzen Threadripper PRO 3995 WX 64 cores. RAM Size : 512 GB

To reproduce

1) Build onnxruntime from source. 2) Install .whl file. 3) Run an inference test for BERT large uncased model.

Urgency

No response

Platform

Windows

OS Version

10.0.19044 Build 19044

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.10.0

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Model File

*

Is this a quantized model?

No

tianleiwu commented 1 year ago

@GAURAVSH19, could you try our benchmark tools like the following in a conda environment with python 3.8 or 3.9:

conda create --name ort python=3.8

conda activate ort

conda install pytorch torchvision torchaudio cpuonly -c pytorch

pip install onnx onnxruntime coloredlogs packaging psutil py3nvml numpy transformers py-cpuinfo sympy protobuf==3.20.1

python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel --provider CPU -p fp32 -o by_script -i 3 -t 100 -b 1 -s 128 -n 8 16 32 64 128

python -m onnxruntime.transformers.machine_info

The benchmark output will tell the performance of different number of threads.

The last command will print your package and system info. Just to make sure the packages are same version so that the comparison is apple to apple, and check cpu flags to see whether there is avx512*.