Closed LIUKAI0815 closed 2 weeks ago
这个模型 应该还在main分支
@Jintao-Huang 我使用main里的安装安装完之后,Gemma2这个怎么选择 eager 这个参数?swift哪个参数对应这个? 还是说Gemma2就不能用flash attention 了?去掉 --use_flash_attn True 就可以了
报错:It is strongly recommended to train Gemma2 models with the eager
attention implementation instead of flash_attention_2
. Use eager
with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager')
.
rank0: Traceback (most recent call last):
rank0: File "/workspace/model/llm/swift/swift/cli/sft.py", line 5, in
rank0: File "/workspace/model/llm/swift/swift/utils/run_utils.py", line 32, in x_main rank0: result = llm_x(args, **kwargs) rank0: File "/workspace/model/llm/swift/swift/llm/sft.py", line 394, in llm_sft
rank0: File "/workspace/model/llm/swift/swift/trainers/mixin.py", line 538, in train
rank0: res = super().train(resume_from_checkpoint, args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train
rank0: return inner_training_loop(
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop
rank0: tr_loss_step = self.training_step(model, inputs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step
rank0: loss = self.compute_loss(model, inputs)
rank0: File "/workspace/model/llm/swift/swift/trainers/trainers.py", line 179, in compute_loss
rank0: outputs = model(inputs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(*args, *kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward
rank0: else self._run_ddp_forward(inputs, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward
rank0: return self.module(*inputs, kwargs) # type: ignoreindex: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(*args, *kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward
rank0: return model_forward(*args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/accelerate/utils/operations.py", line 807, in call
rank0: return convert_to_fp32(self.model_forward(*args, *kwargs))
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast
rank0: return func(args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward
rank0: return self.base_model(
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(*args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(*args, *kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward
rank0: return self.model.forward(args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1073, in forward
rank0: outputs = self.model(
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
rank0: return self._call_impl(*args, kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
rank0: return forward_call(*args, *kwargs)
rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 902, in forward
rank0: layer_outputs = self._gradient_checkpointing_func(
rank0: File "/workspace/model/llm/swift/swift/llm/utils/model.py", line 6139, in
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) ms-swift 2.2.5
![Uploading 企业微信截图_17230037694865.png…]()
已经更新到最新的版本,在--model_type gemma2-2b-instruct微调的时候还是报错了