modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.39k stars 289 forks source link

不支持--model_type gemma2-2b-instruct #1619

Closed LIUKAI0815 closed 2 weeks ago

LIUKAI0815 commented 1 month ago

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图) ms-swift 2.2.5

![Uploading 企业微信截图_17230037694865.png…]()

已经更新到最新的版本,在--model_type gemma2-2b-instruct微调的时候还是报错了

Jintao-Huang commented 1 month ago

这个模型 应该还在main分支

LIUKAI0815 commented 1 month ago

@Jintao-Huang 我使用main里的安装安装完之后,Gemma2这个怎么选择 eager 这个参数?swift哪个参数对应这个? 还是说Gemma2就不能用flash attention 了?去掉 --use_flash_attn True 就可以了

报错:It is strongly recommended to train Gemma2 models with the eager attention implementation instead of flash_attention_2. Use eager with AutoModelForCausalLM.from_pretrained('<path-to-checkpoint>', attn_implementation='eager'). rank0: Traceback (most recent call last): rank0: File "/workspace/model/llm/swift/swift/cli/sft.py", line 5, in

rank0: File "/workspace/model/llm/swift/swift/utils/run_utils.py", line 32, in x_main rank0: result = llm_x(args, **kwargs) rank0: File "/workspace/model/llm/swift/swift/llm/sft.py", line 394, in llm_sft

rank0: File "/workspace/model/llm/swift/swift/trainers/mixin.py", line 538, in train rank0: res = super().train(resume_from_checkpoint, args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 1932, in train rank0: return inner_training_loop( rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 2268, in _inner_training_loop rank0: tr_loss_step = self.training_step(model, inputs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/trainer.py", line 3307, in training_step rank0: loss = self.compute_loss(model, inputs) rank0: File "/workspace/model/llm/swift/swift/trainers/trainers.py", line 179, in compute_loss rank0: outputs = model(inputs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1593, in forward rank0: else self._run_ddp_forward(inputs, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/parallel/distributed.py", line 1411, in _run_ddp_forward rank0: return self.module(*inputs, kwargs) # type: ignoreindex: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/accelerate/utils/operations.py", line 819, in forward rank0: return model_forward(*args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/accelerate/utils/operations.py", line 807, in call rank0: return convert_to_fp32(self.model_forward(*args, *kwargs)) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/amp/autocast_mode.py", line 16, in decorate_autocast rank0: return func(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/peft/peft_model.py", line 1430, in forward rank0: return self.base_model( rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 179, in forward rank0: return self.model.forward(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 1073, in forward rank0: outputs = self.model( rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 902, in forward rank0: layer_outputs = self._gradient_checkpointing_func( rank0: File "/workspace/model/llm/swift/swift/llm/utils/model.py", line 6139, in rank0: _old_checkpoint(args, use_reentrant=use_reentrant, kwargs)) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/_compile.py", line 24, in inner rank0: return torch._dynamo.disable(fn, recursive)(*args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/_dynamo/eval_frame.py", line 451, in _fn rank0: return fn(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/_dynamo/external_utils.py", line 36, in inner rank0: return fn(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/utils/checkpoint.py", line 494, in checkpoint rank0: ret = function(*args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 655, in forward rank0: hidden_states, self_attn_weights, present_key_value = self.self_attn( rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl rank0: return self._call_impl(*args, *kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl rank0: return forward_call(args, **kwargs) rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 378, in forward rank0: attn_output = self._flash_attention_forward( rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/models/gemma2/modeling_gemma2.py", line 438, in _flash_attention_forward rank0: flash_kwargs = {"softcap"} if is_flash_attn_greater_or_equal("2.6.0") else {} rank0: File "/root/miniconda3/envs/swift/lib/python3.10/site-packages/transformers/utils/import_utils.py", line 819, in is_flash_attn_greater_or_equal rank0: return version.parse(importlib.metadata.version("flash_attn")) >= version.parse(version)