Closed JxuHenry closed 11 months ago
试试 https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
好的谢谢,我换完模型后,报错显存溢出问题,我使用的是24G的3090:
减小 batchsize
减小 batchsize
好的谢谢
试试 https://huggingface.co/hiyouga/Baichuan2-7B-Base-LLaMAfied
请问下您给的这个链接的模型和开源路径下的那个模型的区别是什么呢?
CUDA_VISIBLE_DEVICES=1 python src/train_bash.py \ --stage sft \ --model_name_or_path ./baichuan-7B-base \ --do_train \ --dataset MC \ --template baichuan2 \ --finetuning_type lora \ --lora_target W_pack \ --output_dir ./output/finetune \ --overwrite_cache \ --per_device_train_batch_size 4 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 1000 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --fp16
报错: File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 230, in forward outputs = run_function(args) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 449, i n custom_forward return module(inputs, output_attentions, None) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrap ped_call_impl return self._call_impl(*args, kwargs) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call _impl return forward_call(*args, *kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 273, i n forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrap ped_call_impl return self._call_impl(args, kwargs) File "/root/miniconda3/envs/llama/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call _impl return forward_call(*args, **kwargs) File "/root/.cache/huggingface/modules/transformers_modules/baichuan-7B-base/modeling_baichuan.py", line 207, i n forward query_states = proj[0].view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2) RuntimeError: shape '[1232, 4096, 32, 128]' is invalid for input of size 5046272