modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.65k stars 312 forks source link

微调qwen1.5-7b-chat-awq后不能合并权重 #780

Closed catundchat closed 5 months ago

catundchat commented 5 months ago

Describe the bug 微调好了qwen1.5-7b-chat-awq模型,合并权重时报错

PS D:\github\swift> $env:CUDA_VISIBLE_DEVICES="0"
>> swift export `
>>     --ckpt_dir "D:\github\swift\output\qwen1half-7b-chat-awq\v6-20240419-173504\checkpoint-984" `
>>     --merge_lora $true
run sh: `python C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\cli\export.py --ckpt_dir D:\github\swift\output\qwen1half-7b-chat-awq\v6-20240419-173504\checkpoint-984 --merge_lora True`

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binary_path: C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...
2024-04-23 17:06:33,957 - modelscope - INFO - PyTorch version 2.0.1+cu118 Found.
2024-04-23 17:06:33,958 - modelscope - INFO - Loading ast index from C:\Users\Administrator\.cache\modelscope\ast_indexer
2024-04-23 17:06:34,042 - modelscope - INFO - Loading done! Current index file version is 1.13.3, with md5 988b0f143b5c2125e9fee53a8d5835ac and a total number of 972 components indexed
[INFO:swift] Start time of running main: 2024-04-23 17:06:34.209387
[INFO:swift] ckpt_dir: D:\github\swift\output\qwen1half-7b-chat-awq\v6-20240419-173504\checkpoint-984
[INFO:swift] Setting model_info['revision']: master
[INFO:swift] Setting self.eval_human: True
[INFO:swift] Setting overwrite_generation_config: True
[INFO:swift] Setting args.dataset: ['ms-bench-mini']
[INFO:swift] args: ExportArguments(model_type='qwen1half-7b-chat-awq', model_id_or_path='D:\\poem_model\\Qwen1.5-7B-Chat-AWQ', model_revision='master', sft_type='lora', template_type='qwen', infer_backend='pt', ckpt_dir='D:\\github\\swift\\output\\qwen1half-7b-chat-awq\\v6-20240419-173504\\checkpoint-984', load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='AUTO', dataset=['ms-bench-mini'], dataset_seed=42, dataset_test_ratio=0.01, 
val_dataset_sample=10, save_result=True, system='You are a helpful assistant.', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', 
custom_train_dataset_path=[], custom_val_dataset_path=[], quantization_bit=0, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=True, merge_device_map='auto', save_safetensors=True, overwrite_generation_config=True, verbose=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_model_len=None, vllm_enable_lora=False, vllm_max_lora_rank=16, vllm_lora_modules=[], show_dataset_sample=10, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, to_peft_format=False, quant_bits=0, quant_method='awq', quant_n_samples=256, quant_seqlen=2048, quant_device_map='cpu', push_to_hub=False, hub_model_id=None, hub_token=None, hub_private_repo=False, commit_message='update files')
[INFO:swift] replace_if_exists: False
[INFO:swift] merged_lora_path: `D:\github\swift\output\qwen1half-7b-chat-awq\v6-20240419-173504\checkpoint-984-merged`
[INFO:swift] Setting args.sft_type: 'full'
[INFO:swift] Setting args.ckpt_dir: D:\github\swift\output\qwen1half-7b-chat-awq\v6-20240419-173504\checkpoint-984-merged
[INFO:swift] device_count: 1
[INFO:swift] Loading the model using model_dir: D:\poem_model\Qwen1.5-7B-Chat-AWQ
[INFO:swift] Setting torch_dtype: torch.float16
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:02<00:00,  1.02s/it]
[INFO:swift] model.max_model_len: 32768
[INFO:swift] model_config: Qwen2Config {
  "_name_or_path": "D:\\poem_model\\Qwen1.5-7B-Chat-AWQ",
  "architectures": [
    "Qwen2ForCausalLM"
  ],
  "attention_dropout": 0.0,
  "bos_token_id": 151643,
  "eos_token_id": 151645,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 32768,
  "max_window_layers": 28,
  "model_type": "qwen2",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "quantization_config": {
    "backend": "autoawq",
    "bits": 4,
    "do_fuse": false,
    "fuse_max_seq_len": null,
    "group_size": 128,
    "modules_to_fuse": null,
    "modules_to_not_convert": null,
    "quant_method": "awq",
    "version": "gemm",
    "zero_point": true
  },
  "rms_norm_eps": 1e-06,
  "rope_theta": 1000000.0,
  "sliding_window": 32768,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.38.2",
  "use_cache": true,
  "use_sliding_window": false,
  "vocab_size": 151936
}

Traceback (most recent call last):
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\cli\export.py", line 5, in <module>
    export_main()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\utils\run_utils.py", line 31, in x_main
    result = llm_x(args, **kwargs)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\llm\export.py", line 167, in llm_export
    merge_lora(args, device_map=args.merge_device_map)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\llm\infer.py", line 133, in merge_lora
    Swift.merge_and_unload(model)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\swift\tuners\base.py", line 809, in merge_and_unload
    model.merge_and_unload()
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\lora\model.py", line 713, in merge_and_unload
    return self._unload_and_optionally_merge(
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\lora\model.py", line 367, in _unload_and_optionally_merge    target.merge(safe_merge=safe_merge, adapter_names=adapter_names)
  File "C:\Users\Administrator\AppData\Local\Programs\Python\Python310\lib\site-packages\peft\tuners\tuners_utils.py", line 419, in merge
    raise NotImplementedError
NotImplementedError

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) CUDA 12.4 , win10, RTX A5000, torch: 2.0.1+cu118

Jintao-Huang commented 5 months ago

是的 使用量化模型(bnb, gptq, awq)训练的模型是没有办法merge lora的, 特别是gptq和awq.

bnb的merge-lora会造成很大的损失. gptq, awq量化需要基准数据集.