多卡微调chatglm3报错：内存不足

binsson commented 11 months ago

Reminder

[x] I have read the README and searched the existing issues.

Reproduction

accelerate launch src/train_bash.py \ --stage sft \ --model_name_or_path ../chatglm3-6b \ --do_train \ --dataset babycare \ --template default \ --finetuning_type lora \ --lora_target query_key_value \ --output_dir sft_checkpoint \ --overwrite_cache \ --per_device_train_batch_size 1 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 10 \ --plot_loss \ --fp16

多卡微调参数如上，每次都会产生下面的报错，是什么原因： Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 508.00 MiB (GPU 0; 23.65 GiB total capacity; 22.93 GiB already allocated; 180.06 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Expected behavior

No response

System Info

No response

Others

No response

hiyouga commented 11 months ago

单卡能运行吗？

binsson commented 11 months ago

单卡能运行吗？

单卡可以，多卡尝试了小的数据集也是一样的报错

hiyouga commented 11 months ago

感觉是 ChatGLM3 对多卡的支持有点问题，试试 Deepspeed 运行？

binsson commented 11 months ago

感觉是 ChatGLM3 对多卡的支持有点问题，试试 Deepspeed 运行？

好的，我试一试

leoterry-ulrica commented 10 months ago

感觉是 ChatGLM3 对多卡的支持有点问题，试试 Deepspeed 运行？

好的，我试一试

请问解决了没？

lrh000 commented 7 months ago

感觉是 ChatGLM3 对多卡的支持有点问题，试试 Deepspeed 运行？

已经试过ds=0.10.1和ds=0.13.4，都不行，但是chatglm2就可以

hiyouga / LLaMA-Factory