CPU/QLoRA-FineTuning - Githubissues

intel-analytics / ipex-llm

Accelerate local LLM inference and finetuning (LLaMA, Mistral, ChatGLM, Qwen, Baichuan, Mixtral, Gemma, Phi, MiniCPM, etc.) on Intel XPU (e.g., local PC with iGPU and NPU, discrete GPU such as Arc, Flex and Max); seamlessly integrate with llama.cpp, Ollama, HuggingFace, LangChain, LlamaIndex, GraphRAG, DeepSpeed, vLLM, FastChat, Axolotl, etc.

Apache License 2.0

6.51k stars 1.24k forks source link

CPU/QLoRA-FineTuning #9406

Open ernleite opened 10 months ago

ernleite commented 10 months ago

Hello I am trying to fine tune a LLama2 model

Actually the finetuning process is taking a very long time so I had to cancel it because it is using only one core in my machine (DELL R730 with 2 CPUS / 56 Logicals cores) I tried accelerate config but it is not working Any idea? Thanks !!

jason-dai commented 10 months ago

Do we need to source bigdl-llm-init for QLoRA? @qiyuangong @hzjane

hzjane commented 10 months ago

Do we need to source bigdl-llm-init for QLoRA? @qiyuangong @hzjane

I think it's ok, I'll add it to the readme file.

hzjane commented 10 months ago

Hello I am trying to fine tune a LLama2 model

Actually the finetuning process is taking a very long time so I had to cancel it because it is using only one core in my machine (DELL R730 with 2 CPUS / 56 Logicals cores) I tried accelerate config but it is not working Any idea? Thanks !!

Maybe you can try source bigdl-llm-init or just try taskset -c 0-27 to use more cores.

ernleite commented 10 months ago

thanks for your reply I already did that.. It works but when it starts "converting the current model to sym_int4 format then all disapeer. Only one process remains. Is my server R730 compatible?

in fact python command never worked for me. it only works using llm-convert, llm-cli etc. very strange thanks

hzjane commented 10 months ago

Please check your conda env based on https://github.com/intel-analytics/BigDL/tree/main/python/llm/example/CPU/QLoRA-FineTuning .

ernleite commented 10 months ago

I followed the configuration since the beggining. Thanks

glorysdj commented 10 months ago

thanks for your reply I already did that.. It works but when it starts "converting the current model to sym_int4 format then all disapeer. Only one process remains. Is my server R730 compatible?

in fact python command never worked for me. it only works using llm-convert, llm-cli etc. very strange thanks

Hi @ernleite, what do you mean python command never work? Have you tried taskset -c 0-27 to use more cores? Could you please share the commands for how you run this qlora fine tuning, we will try to check and reproduce it.

ernleite commented 10 months ago

@glorysdj I meant all the command like c 0-X python ./generate.py or qlora_finetuning_cpu.py does not work for me. The only commands that work (using all cores in my machine) are llm-convert or llm-cli

my configuration DELL R730 with 2 CPUs 96 GB RAM Ubuntu 22.04 LTS

I would be so happy if this can work

here an unresolved issue I explained few weeks ago : [https://github.com/intel-analytics/BigDL/issues/8936]

thanks !

ernleite commented 10 months ago

This screenshot showing only one core is used at a given time (100%)

jason-dai commented 10 months ago

@glorysdj I meant all the command like c 0-X python ./generate.py or qlora_finetuning_cpu.py does not work for me. The only commands that work (using all cores in my machine) are llm-convert or llm-cli

my configuration DELL R730 with 2 CPUs 96 GB RAM Ubuntu 22.04 LTS

I would be so happy if this can work

here an unresolved issue I explained few weeks ago : [https://github.com/[/issues/8936](https://github.com/intel-analytics/BigDL/issues/8936)]

thanks !

@ernleite - a quick question: are you able to run bigdl-llm using these python commands on your local PC (either windows or linux)?

ernleite commented 10 months ago

I have a laptop running on Windows 11. Let me try. I will let you know.

ernleite commented 10 months ago

@jason-dai I used my laptop The CPU version works fine with Windows 11 (even it took several hours). Good step then!

I have two GPUs in my laptop but was not able to use my Intel Iris Xe with 16GB I have an issue with the pytorch librairy

I tried many configurations but the Qlora GPU version does not work. Are we sure it works with python 3.9? The DLL is present but seems to not work. I don't know why? I installed the latest Intel GPU drivers & OneAPi too.

So my question is : does the GPU version works with Windows ? What is the equivalent in Windows for souce bigdl-init

thanks again

Jasonzzt commented 10 months ago

This screenshot showing only one core is used at a given time (100%)

@ernleite Do you have a GPU on your machine? I tried to reproduce the issue and found that after converting the current model to sym_int4 format the finetuning program ran on the GPU.

So you can try to disable GPU when you finetune on CPU, and make sure you use the CPU version of package bigdl-llm.

Hope this can help you.

jason-dai commented 10 months ago

So my question is : does the GPU version works with Windows ?

Currently it's not supported yet

liang1wang commented 10 months ago

In my side, blocked at the process 0%(>3h) with MTL RVP when running qlora_finetuning_cpu.py, cmd: python ./qlora_finetuning_cpu.py --repo-id-or-model-path llama-2-7b-hf --dataset english_quotes env: MTL RVP, 8(e)+6(p) core, 96G mem, ubuntu22.04 I have run "source bigdl-llm-init -t" Could you also help on that? thanks!

hzjane commented 10 months ago

We fixed this issue(only use one core) last week. Related to this pr. When the CPU does not support bf16, qlora will automatically use only one core. You can try to use this cmd lscpu | grep bf16 to see if your CPU supports bf16 and caused by it. And You can use the latest qlora_finetuning_cpu.py to run.

ernleite commented 9 months ago

We fixed this issue(only use one core) last week. Related to this pr. When the CPU does not support bf16, qlora will automatically use only one core. You can try to use this cmd lscpu | grep bf16 to see if your CPU supports bf16 and caused by it. And You can use the latest qlora_finetuning_cpu.py to run.

Wow! amazing thanks.
I can confirm that it works really better. For the moment, it only works on a CPU (I have 2) but maybe it is just a misconfiguration. I am deep diving on that now.