Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, etc.) on Intel CPU and GPU (e.g., local PC with iGPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, DeepSpeed, vLLM, FastChat, Axolotl, etc.
Apache License 2.0
6.25k
stars
1.22k
forks
source link
Only 1 arc700 worked when running "finetune_llama2_7b_arc_2_card.sh" with 2 arc770 in workstation. #9677
Hi, @liang1wang , according to your screenshot, the used GPU memory is 11518M in your first card, and the used GPU memory is 11560M in your second card. Two Arc770 were worked.
Sample: https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/alpaca-qlora/finetune_llama2_7b_arc_2_card.sh Env: Intel(R) Xeon(R) w7-3455 2 ARC770 ubuntu22.04, kernel - 6.2.0 mem-125G oneAPI 23.2.0 Model: Llama-2-7b-hf
![image](https://github.com/intel-analytics/BigDL/assets/103090651/7d732113-cbc2-47cc-bcc5-2980ee5af027)