hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.08k stars 4.07k forks source link

多卡微调chatglm3报错:内存不足 #1672

Closed binsson closed 11 months ago

binsson commented 11 months ago

Reminder

Reproduction

accelerate launch src/train_bash.py \ --stage sft \ --model_name_or_path ../chatglm3-6b \ --do_train \ --dataset babycare \ --template default \ --finetuning_type lora \ --lora_target query_key_value \ --output_dir sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 10 \ --plot_loss \ --fp16

多卡微调参数如上,每次都会产生下面的报错,是什么原因: Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 23.65 GiB total capacity; 22.93 GiB already allocated; 180.06 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior

No response

System Info

No response

Others

No response

hiyouga commented 11 months ago

单卡能运行吗?

binsson commented 11 months ago

单卡能运行吗?

单卡可以,多卡尝试了小的数据集也是一样的报错

hiyouga commented 11 months ago

感觉是 ChatGLM3 对多卡的支持有点问题,试试 Deepspeed 运行?

binsson commented 11 months ago

感觉是 ChatGLM3 对多卡的支持有点问题,试试 Deepspeed 运行?

好的,我试一试

leoterry-ulrica commented 10 months ago

感觉是 ChatGLM3 对多卡的支持有点问题,试试 Deepspeed 运行?

好的,我试一试

请问解决了没?

lrh000 commented 7 months ago

感觉是 ChatGLM3 对多卡的支持有点问题,试试 Deepspeed 运行?

已经试过ds=0.10.1和ds=0.13.4,都不行,但是chatglm2就可以