Closed Ywandung-Lyou closed 7 months ago
Hi!
For OpenVINO we have optimum-intel
which offers a simple OVModelForxxx
API which works the same way.
repo: https://github.com/huggingface/optimum-intel?tab=readme-ov-file#openvino
docs: https://huggingface.co/docs/optimum/main/en/intel/inference
cc @echarlaix
ORTModel
defaults to CPUExecutionProvider, you can refer to https://huggingface.co/docs/optimum/main/en/onnxruntime/package_reference/modeling_ort#optimum.onnxruntime.ORTModel.from_pretrained.provider.
The OVModelForSeq2Seq is much slower than ORTodelForSeq2Seq for the mT5 model. The latter also provides option to choose an instruction set like --avx2 or --avx512, etc. On my dataset the openvino model is 4 times slower than onnxruntime, this is after removing first inference from mean latency calculation.
@ShrishShankar Thank you for the report. Could you open an issue at https://github.com/huggingface/optimum-intel/issues ?
According to the wiki, OpenVINO is one of the ONNXRuntime's execution providers.
I am deploying model on Intel Xeon Gold server, which supports AVX512 and which is compatible with Intel OpenVINO. How could I tell if the accelerator is Default CPU or OpenVINO?