合并qwen1.5-1.8B Base模型为微调权重时出现错误

GravitySaika commented 5 months ago

Reminder

[X] I have read the README and searched the existing issues.

Reproduction

这是我微调时使用的参数

CUDA_VISIBLE_DEVICES=1 python src/train_bash.py \
    --stage sft \
    --do_train \
    --model_name_or_path ./Qwen1.5-1.8B \
    --dataset finetune_set \
    --template default \
    --finetuning_type lora \
    --lora_target q_proj,v_proj \
    --output_dir ./Qwen1.5-1.8B/finetune \
    --overwrite_cache \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --lr_scheduler_type cosine \
    --logging_steps 10 \
    --save_steps 1000 \
    --learning_rate 5e-5 \
    --num_train_epochs 1.0 \
    --plot_loss \
    --fp16

这是我合并时使用的参数

CUDA_VISIBLE_DEVICES=5 python src/export_model.py \
    --model_name_or_path ./Qwen1.5-1.8B \
    --adapter_name_or_path ./Qwen1.5-1.8B/finetune \
    --template default \
    --finetuning_type lora \
    --export_dir ./Qwen1.5-1.8B/finetune/merge \
    --export_size 10 \
    --export_legacy_format False

合并时提示我

03/19/2024 23:11:40 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA                                                                                           
Traceback (most recent call last):                                                                                                                                       
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/export_model.py", line 9, in <module>                                                                                     
    main()                                                                                                                                                               
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/export_model.py", line 5, in main                                                                                         
    export_model()                                                                                                                                                       
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/llmtuner/train/tuner.py", line 52, in export_model                                                                        
    model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args)                                                                                             
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/llmtuner/model/loader.py", line 150, in load_model_and_tokenizer                                                          
    model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)                                                                              
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/llmtuner/model/loader.py", line 94, in load_model                                                                         
    model = init_adapter(model, model_args, finetuning_args, is_trainable)                                                                                               
  File "/mnt/sm870/yangqimin/LLaMA-Factory/src/llmtuner/model/adapter.py", line 110, in init_adapter                                                                     
    model: "LoraModel" = PeftModel.from_pretrained(model, adapter)                                                                                                       
  File "/home/sun/anaconda3/envs/jc-train/lib/python3.10/site-packages/peft/peft_model.py", line 324, in from_pretrained                                                 
    config = PEFT_TYPE_TO_CONFIG_MAPPING[                                                                                                                                
  File "/home/sun/anaconda3/envs/jc-train/lib/python3.10/site-packages/peft/config.py", line 151, in from_pretrained                                                     
    return cls.from_peft_type(**kwargs)                                                                                                                                  
  File "/home/sun/anaconda3/envs/jc-train/lib/python3.10/site-packages/peft/config.py", line 118, in from_peft_type                                                      
    return config_cls(**kwargs)                                                                                                                                          
TypeError: LoraConfig.__init__() got an unexpected keyword argument 'layer_replication'

我通过删除微调的目标文件夹中adapter_config.json文件中的'layer_replication' 项，能够成功合并权重并使用，我不清楚这是否是一个BUG，以及这样操作是否会带来其他问题。

Expected behavior

我原本希望使用我自己的数据集微调qwen1.5模型并合并权重进行使用

System Info

transformers version: 4.38.2
Platform: Linux-5.15.0-91-generic-x86_64-with-glibc2.35
Python version: 3.10.13
Huggingface_hub version: 0.21.3
Safetensors version: 0.4.2
Accelerate version: 0.27.2
Accelerate config: - compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: fp16
- use_cpu: False
- debug: True
- num_processes: 2
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
PyTorch version (GPU?): 2.2.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?:
Using distributed or parallel set-up in script?: