Open tsantra opened 1 year ago
Hi, @tsantra Would you mind trying it again after pip install accelerate==0.23.0
?
@rnwang04 Thank you. It worked after installing accelerate=0.23.0
I have two questions:
Hi @tsantra ,
checkpoint-200-merged
), you can use it as a normal huggingface transformer model to do inference on Flex GPU, like https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/GPU/HF-Transformers-AutoModels/Model/llama2Hi @tsantra , QLoRA CPU example is updated here(https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning)
Hi @rnwang04 , thank you for your reply!
Are you using any metric to check for model accuracy after QLora finetuning. I had used my custom dataset for finetuning and my inference results are not good. Model is hallucinating a lot. Do you have any BKM for fine-tuning?
Are you also using any profiler to check for GPU memory usage? Do you have any suggestion?
Had closed by mistake.
@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.
@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.
Hi @tsantra , have you ever run GPU finetuning successfully ? or you always meet this error? If you ever run GPU finetuning successfully before, do you make any changes to your script or env settings?
Are you using any metric to check for model accuracy after QLora finetuning. I had used my custom dataset for finetuning and my inference results are not good. Model is hallucinating a lot. Do you have any BKM for fine-tuning?
Have you checked your loss curve of finetuning? Is the loss decreasing normally during the finetune process and ultimately stabilizing at a fixed value? What are the approximate train loss and eval loss in the end?
Are you also using any profiler to check for GPU memory usage? Do you have any suggestion?
I just use "GPU Memory Used" column in "sudo xpu-smi stats -d 0" to check GPU memory usage.
@rnwang04 GPU finetuning suddenly stopped working and gave Seg Fault.
are you running it inside vscode?
Hi,
I am trying to use the Qlora code as provided in the repo on a Sapphire Rapids, Flex GPU machine.
I was able to run the qlora_finetuning.py without any error.
But the export_merged_model.py is giving me this error:
The command I used to merge the model: python ./export_merged_model.py --repo-id-or-model-path < path to llama-2-7b-chat-hf> --adapter_path ./outputs/checkpoint-200 --output_path ./outputs/checkpoint-200-merged
OS : Ubuntu 22 This is my training info: