Open stay-leave opened 1 month ago
合并脚本:
CUDA_VISIBLE_DEVICES=0 swift export \ --model_type internvl2-26b \ --model_id_or_path /root/f_data_1/InternVL2-26B \ --ckpt_dir "/root/visual_model/fine_tune/output/internvl2-26b/v1-20240810-171452/checkpoint-185" \ --merge_lora true
dpo使用lora训练后的检查点:
sft是能正常合并的。dpo在上周也能正常合并,这周新配置的环境,出现这个bug,我用peft自己合并也是这个报错。
报错:
Traceback (most recent call last): File "/root/visual_model/swift/swift/cli/export.py", line 5, in <module> export_main() File "/root/visual_model/swift/swift/utils/run_utils.py", line 32, in x_main result = llm_x(args, **kwargs) File "/root/visual_model/swift/swift/llm/export.py", line 190, in llm_export merge_lora(args, device_map=args.merge_device_map) File "/root/visual_model/swift/swift/llm/infer.py", line 113, in merge_lora model, template = prepare_model_template(args, device_map=device_map, verbose=False) File "/root/visual_model/swift/swift/llm/infer.py", line 230, in prepare_model_template model = Swift.from_pretrained(model, args.ckpt_dir, inference_mode=True) File "/root/visual_model/swift/swift/tuners/base.py", line 878, in from_pretrained peft_model = load_peft_model(model, 'default') File "/root/visual_model/swift/swift/tuners/base.py", line 864, in load_peft_model return PeftModel.from_pretrained( File "/root/visual_model/swift/swift/tuners/peft.py", line 367, in from_pretrained return module_class.from_pretrained(model, model_id, *args, **kwargs) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 541, in from_pretrained model = MODEL_TYPE_TO_PEFT_MODEL_MAPPING[config.task_type]( File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 1542, in __init__ super().__init__(model, peft_config, adapter_name, **kwargs) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/peft_model.py", line 155, in __init__ self.base_model = cls(model, {adapter_name: peft_config}, adapter_name) File "/root/visual_model/swift/swift/tuners/peft.py", line 315, in init self.__init_origin__(model, config, adapter_name) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 139, in __init__ super().__init__(model, config, adapter_name) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 175, in __init__ self.inject_adapter(self.model, adapter_name) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 431, in inject_adapter self._create_and_replace(peft_config, adapter_name, target, target_name, parent, current_key=key) File "/root/visual_model/swift/swift/tuners/peft.py", line 97, in _create_and_replace_hook return self._create_and_replace_origin(*args, **kwargs) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 224, in _create_and_replace new_module = self._create_new_module(lora_config, adapter_name, target, **kwargs) File "/root/visual_model/py_env/swift/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 346, in _create_new_module raise ValueError( ValueError: Target module InternLM2ForCausalLM( (model): InternLM2Model( (tok_embeddings): Embedding(92553, 6144, padding_idx=2) (layers): ModuleList( (0-47): 48 x InternLM2DecoderLayer( (attention): InternLM2Attention( (wqkv): Linear(in_features=6144, out_features=8192, bias=False) (wo): Linear(in_features=6144, out_features=6144, bias=False) (rotary_emb): InternLM2DynamicNTKScalingRotaryEmbedding() ) (feed_forward): InternLM2MLP( (w1): Linear(in_features=6144, out_features=16384, bias=False) (w3): Linear(in_features=6144, out_features=16384, bias=False) (w2): Linear(in_features=16384, out_features=6144, bias=False) (act_fn): SiLU() ) (attention_norm): InternLM2RMSNorm() (ffn_norm): InternLM2RMSNorm() ) ) (norm): InternLM2RMSNorm() ) (output): Linear(in_features=6144, out_features=92553, bias=False) ) is not supported. Currently, only the following modules are supported: `torch.nn.Linear`, `torch.nn.Embedding`, `torch.nn.Conv2d`, `transformers.pytorch_utils.Conv1D`.
可能是bug,我复现一下
在群里问了下,拉取了周一的最新代码,可以正常合并了,但是合并写入了权重后迟迟不结束,就在那卡着。最后把我两卡A100卡死机了。可能还有bug
好的,我正在复现,稍等
书生 20B 感觉效果不如 7B
哈哈,下周我8,26,40都体验下
合并脚本:
dpo使用lora训练后的检查点:
sft是能正常合并的。dpo在上周也能正常合并,这周新配置的环境,出现这个bug,我用peft自己合并也是这个报错。
报错: