Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Currently we provide many fine-tuning options e.g. ReLoRA, axolotl and DPO etc. as shown here, as well as Galore and LISA on way, where some can outperform LoRA.
We are going to investigate and evaluate whether to support RoSA and QRoSA.
It would be brilliant if we could get implemented fine-tuning methods for robust adaptation given how much better it is than LoRA and QLoRA methods.