huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools
https://huggingface.co/docs/optimum/main/
Apache License 2.0
2.56k stars 462 forks source link

How to tell whether the backend of ONNXRuntime accelerator is Intel VINO. #1754

Closed Ywandung-Lyou closed 7 months ago

Ywandung-Lyou commented 8 months ago

According to the wiki, OpenVINO is one of the ONNXRuntime's execution providers.

I am deploying model on Intel Xeon Gold server, which supports AVX512 and which is compatible with Intel OpenVINO. How could I tell if the accelerator is Default CPU or OpenVINO?

from sentence_transformers import SentenceTransformer, models
from optimum.onnxruntime import ORTModelForCustomTasks
from transformers import AutoTokenizer

ort_model = ORTModelForCustomTasks.from_pretrained('Geotrend/distilbert-base-zh-cased', export=True)
tokenizer = AutoTokenizer.from_pretrained(checkpoint)

ort_model.save_pretrained(save_directory + "/" + checkpoint)
tokenizer.save_pretrained(save_directory + "/" + checkpoint)
Framework not specified. Using pt to export to ONNX.
Using the export variant default. Available variants are:
    - default: The default ONNX variant.
Using framework PyTorch: 2.1.2.post300
IlyasMoutawwakil commented 8 months ago

Hi!

For OpenVINO we have optimum-intel which offers a simple OVModelForxxx API which works the same way. repo: https://github.com/huggingface/optimum-intel?tab=readme-ov-file#openvino docs: https://huggingface.co/docs/optimum/main/en/intel/inference

cc @echarlaix

fxmarty commented 7 months ago

ORTModel defaults to CPUExecutionProvider, you can refer to https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained.provider.

ShrishShankar commented 4 months ago

The OVModelForSeq2Seq is much slower than ORTodelForSeq2Seq for the mT5 model. The latter also provides option to choose an instruction set like --avx2 or --avx512, etc. On my dataset the openvino model is 4 times slower than onnxruntime, this is after removing first inference from mean latency calculation.

fxmarty commented 4 months ago

@ShrishShankar Thank you for the report. Could you open an issue at https://github.com/huggingface/optimum-intel/issues ?