intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Mixtral, Gemma, Phi, MiniCPM, Qwen-VL, MiniCPM-V, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, vLLM, GraphRAG, DeepSpeed, Axolotl, etc
Apache License 2.0
6.76k stars 1.27k forks source link

issue with qlora fine-tuning on Flex GPU #9308

Open tsantra opened 1 year ago

tsantra commented 1 year ago

Hi,

I am trying to use the Qlora code as provided in the repo on a Sapphire Rapids, Flex GPU machine.

I was able to run the qlora_finetuning.py without any error.

But the export_merged_model.py is giving me this error:

image

The command I used to merge the model: python ./export_merged_model.py --repo-id-or-model-path < path to llama-2-7b-chat-hf> --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged

OS : Ubuntu 22 This is my training info:

image
rnwang04 commented 1 year ago

Hi, @tsantra Would you mind trying it again after pip install accelerate==0.23.0 ?

tsantra commented 1 year ago

@rnwang04 Thank you. It worked after installing accelerate=0.23.0

I have two questions:

  1. Is QLora fine-tuning supported on CPU?
  2. The code here https://github.com/intel-analytics/BigDL/blob/main/python/llm/example/GPU/QLoRA-FineTuning/export_merged_model.py , shows device_map={"": cpu}, so which part of the code is running on the Flex GPU?
rnwang04 commented 1 year ago

Hi @tsantra ,

  1. Yes, it's supported on CPU, we will provide an official CPU example later.
  2. After you got the merged model (for example checkpoint-200-merged), you can use it as a normal huggingface transformer model to do inference on Flex GPU, like https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2
rnwang04 commented 1 year ago

Hi @tsantra , QLoRA CPU example is updated here(https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning)

tsantra commented 1 year ago

Hi @rnwang04 , thank you for your reply!

Are you using any metric to check for model accuracy after QLora finetuning. I had used my custom dataset for finetuning and my inference results are not good. Model is hallucinating a lot. Do you have any BKM for fine-tuning?

Are you also using any profiler to check for GPU memory usage? Do you have any suggestion?

tsantra commented 1 year ago

Had closed by mistake.

tsantra commented 1 year ago

@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.

image
rnwang04 commented 1 year ago

@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.

Hi @tsantra , have you ever run GPU finetuning successfully ? or you always meet this error? If you ever run GPU finetuning successfully before, do you make any changes to your script or env settings?

rnwang04 commented 1 year ago

Are you using any metric to check for model accuracy after QLora finetuning. I had used my custom dataset for finetuning and my inference results are not good. Model is hallucinating a lot. Do you have any BKM for fine-tuning?

Have you checked your loss curve of finetuning? Is the loss decreasing normally during the finetune process and ultimately stabilizing at a fixed value? What are the approximate train loss and eval loss in the end?

Are you also using any profiler to check for GPU memory usage? Do you have any suggestion?

I just use "GPU Memory Used" column in "sudo xpu-smi stats -d 0" to check GPU memory usage.

shane-huang commented 10 months ago

@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.

image

are you running it inside vscode?