janhq / cortex.cpp

Local AI API Platform
https://cortex.so
Apache License 2.0
2.06k stars 116 forks source link

hardware: Intel iGPU, dGPU and NPU support #470

Open xiangyang-95 opened 7 months ago

xiangyang-95 commented 7 months ago

Overview

Tasklist

Original Post

Problem Unable to use Intel integrated GPU and discrete GPU to offload the model inferencing.

Success Criteria Able to use Intel integrated GPU and discrete GPU to offload the model inferencing.

Additional context I can add in the documentation to add Intel GPU support to nitro inference server.

rahulunair commented 7 months ago

I am trying to do the same, this would be incredibly useful with the latest Intel Core Ultra chips as well, as there is a unified cpu + GPU architecture.

With this, any user using jan or nitro directly on latest intel core ultra cpus will get a boost from the gpu

Options:

  1. Intel acceleration support for llama.cpp using ipex-llm as a backend: https://ipex-llm.readthedocs.io/en/latest/doc/LLM/Quickstart/llama_cpp_quickstart.html

  2. Direct compilation of llama.cpp with sycl bindings on device that support sycl: https://github.com/ggerganov/llama.cpp/blob/master/README-sycl.md

xiangyang-95 commented 7 months ago

@rahulunair I have integrated the nitro server with Intel OneAPI SYCL Optimization on llama.cpp. Stay tuned for the pull request.

0xSage commented 4 months ago

closing as dupe of #677

dan-homebrew commented 2 months ago

I'm re-opening this, given our discussions with Intel. We should evaluate the following possibilities: