Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Provide cmake build script and related cpp source code
Initial README
4. How to test?
[ ] Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
Description
1. Why the change?
https://github.com/analytics-zoo/nano/issues/1716 First version example of our current C++ NPU solution.
2. User API changes
See example README.md .
3. Summary of the change
4. How to test?
1234
). And paste your action link here once it has been successfully finished.