hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.86k stars 4.17k forks source link

使用baichuan2进行fine-tune报错:RuntimeError: shape '[1232, 4096, 32, 128]' is invalid for input of size 5046272 #1719

Closed JxuHenry closed 11 months ago

JxuHenry commented 11 months ago

CUDA_VISIBLE_DEVICES=1 python src/train_bash.py \ --stage sft \ --model_name_or_path ./baichuan-7B-base \ --do_train \ --dataset MC \ --template baichuan2 \ --finetuning_type lora \ --lora_target W_pack \ --output_dir ./output/finetune \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --fp16

报错: File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(args) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 449, i n custom_forward return module(inputs, output_attentions, None) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrap ped_call_impl return self._call_impl(*args, kwargs) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call _impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 273, i n forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrap ped_call_impl return self._call_impl(args, kwargs) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call _impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 207, i n forward query_states = proj[0].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) RuntimeError: shape '[1232, 4096, 32, 128]' is invalid for input of size 5046272

hiyouga commented 11 months ago

试试 https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied

JxuHenry commented 11 months ago

试试 https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied

好的谢谢,我换完模型后,报错显存溢出问题,我使用的是24G的3090:

image
hiyouga commented 11 months ago

减小 batchsize

JxuHenry commented 11 months ago

减小 batchsize

好的谢谢

JxuHenry commented 11 months ago

试试 https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied

请问下您给的这个链接的模型和开源路径下的那个模型的区别是什么呢?