Llama.cpp is a very popular and excellent LLM/VLM inference deployment framework, implemented in pure C/C++, without any dependencies, and cross-platform. Based on SYCL and Vulkan, it can support some Intel integrated graphics cards for inference acceleration, but there are many compatibility issues and it does not support NPU at all. Can Intel provide OpenVINO backend support for this project?
Feature Use Case
No response
Issue submission checklist
[X] The feature request or improvement must be related to OpenVINO
Request Description
Llama.cpp is a very popular and excellent LLM/VLM inference deployment framework, implemented in pure C/C++, without any dependencies, and cross-platform. Based on SYCL and Vulkan, it can support some Intel integrated graphics cards for inference acceleration, but there are many compatibility issues and it does not support NPU at all. Can Intel provide OpenVINO backend support for this project?
Feature Use Case
No response
Issue submission checklist