Open GAURAVSH19 opened 1 year ago
@GAURAVSH19, could you try our benchmark tools like the following in a conda environment with python 3.8 or 3.9:
conda create --name ort python=3.8
conda activate ort
conda install pytorch torchvision torchaudio cpuonly -c pytorch
pip install onnx onnxruntime coloredlogs packaging psutil py3nvml numpy transformers py-cpuinfo sympy protobuf==3.20.1
python -m onnxruntime.transformers.benchmark -m bert-large-uncased --model_class AutoModel --provider CPU -p fp32 -o by_script -i 3 -t 100 -b 1 -s 128 -n 8 16 32 64 128
python -m onnxruntime.transformers.machine_info
The benchmark output will tell the performance of different number of threads.
The last command will print your package and system info. Just to make sure the packages are same version so that the comparison is apple to apple, and check cpu flags to see whether there is avx512*.
Describe the issue
Hello,
I have been using onnxruntime on both Linux and Windows. What I observe is that when I run BERT model for inferencing, there is a huge performance difference. On a Linux machine performance is almost 80% more than a Windows machine. Here is the system configuration I am using.
Inference Type : CPU Inferencing. Processor Information : AMD Ryzen Threadripper PRO 3995 WX 64 cores. RAM Size : 512 GB
To reproduce
1) Build onnxruntime from source. 2) Install .whl file. 3) Run an inference test for BERT large uncased model.
Urgency
No response
Platform
Windows
OS Version
10.0.19044 Build 19044
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
1.10.0
ONNX Runtime API
C++
Architecture
X64
Execution Provider
Default CPU
Execution Provider Library Version
No response
Model File
*
Is this a quantized model?
No