Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.76k
stars
1.27k
forks
source link
Only 1 arc700 worked when running "finetune_llama2_7b_arc_2_card.sh" with 2 arc770 in workstation. #9677
Hi, @liang1wang , according to your screenshot, the used GPU memory is 11518M in your first card, and the used GPU memory is 11560M in your second card. Two Arc770 were worked.
Sample: https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/finetune_llama2_7b_arc_2_card.sh Env: Intel(R) Xeon(R) w7-3455 2 ARC770 ubuntu22.04, kernel - 6.2.0 mem-125G oneAPI 23.2.0 Model: Llama-2-7b-hf