[feature-request] Add OpenVINO as an inference-only backend

Vipitis commented 9 months ago

What is OpenVINO? OpenVINO enables inference optimizations for various devices. Huggingface already provides inference optimizations with optimum-intel.

Why support it? I am using accelerate mainly to write device agnostic code in combination with various HF libraries. Accelerator.device() being the most useful feature next to accelerate launch for example with evaluation-harness. There are reports of OpenVINO working with AMD GPUs on Windows, and it's also much simpler than installing IPEX for GPU inference with Intel GPU. Models in OpenVINO Intermedia Representation can be supported (including various quantizations)

Limitations OpenVINO is inference only. It's main target is edge devices and vision model.

even if this is not a good fit, I simply wanted to have the idea out there for others to find. Accelerate already supports onnxruntime which does a lot of similar things, and can also use openvino as an execution provider on supported devices (namely Intel NPU).

muellerzr commented 9 months ago

We can look into this, though I don't think we support onnxruntime? Where did you see that? 🤔

Vipitis commented 9 months ago

[...] I don't think we support onnxruntime? Where did you see that? 🤔

via torch dynamo mentioned here: https://github.com/huggingface/accelerate/blob/68b3dbf666155a67925b9243c94f71c81fd557f4/src/accelerate/utils/dataclasses.py#L366

huggingface / accelerate

[feature-request] Add OpenVINO as an inference-only backend #2306