Open xiangyang-95 opened 7 months ago
I am trying to do the same, this would be incredibly useful with the latest Intel Core Ultra chips as well, as there is a unified cpu + GPU architecture.
With this, any user using jan or nitro directly on latest intel core ultra cpus will get a boost from the gpu
Options:
Intel acceleration support for llama.cpp using ipex-llm as a backend: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html
Direct compilation of llama.cpp with sycl bindings on device that support sycl: https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md
@rahulunair I have integrated the nitro server with Intel OneAPI SYCL Optimization on llama.cpp. Stay tuned for the pull request.
closing as dupe of #677
I'm re-opening this, given our discussions with Intel. We should evaluate the following possibilities:
sycl
(preferred)
Overview
Tasklist
Original Post
Problem Unable to use Intel integrated GPU and discrete GPU to offload the model inferencing.
Success Criteria Able to use Intel integrated GPU and discrete GPU to offload the model inferencing.
Additional context I can add in the documentation to add Intel GPU support to nitro inference server.