huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.85k stars 953 forks source link

[feature-request] Add OpenVINO as an inference-only backend #2306

Open Vipitis opened 9 months ago

Vipitis commented 9 months ago

What is OpenVINO? OpenVINO enables inference optimizations for various devices. Huggingface already provides inference optimizations with optimum-intel.

Why support it? I am using accelerate mainly to write device agnostic code in combination with various HF libraries. Accelerator.device() being the most useful feature next to accelerate launch for example with evaluation-harness. There are reports of OpenVINO working with AMD GPUs on Windows, and it's also much simpler than installing IPEX for GPU inference with Intel GPU. Models in OpenVINO Intermedia Representation can be supported (including various quantizations)

Limitations OpenVINO is inference only. It's main target is edge devices and vision model.

even if this is not a good fit, I simply wanted to have the idea out there for others to find. Accelerate already supports onnxruntime which does a lot of similar things, and can also use openvino as an execution provider on supported devices (namely Intel NPU).

muellerzr commented 9 months ago

We can look into this, though I don't think we support onnxruntime? Where did you see that? 🤔

Vipitis commented 9 months ago

[...] I don't think we support onnxruntime? Where did you see that? 🤔

via torch dynamo mentioned here: https://github.com/huggingface/accelerate/blob/68b3dbf666155a67925b9243c94f71c81fd557f4/src/accelerate/utils/dataclasses.py#L366