modelscope / ms-swift

Use PEFT or Full-parameter to finetune 350+ LLMs or 90+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.66k stars 312 forks source link

Training a LoRA of Qwen2.5-Coder-7B-Instruct but appear Qwen2 in app-ui #2125

Open neverbiasu opened 6 days ago

neverbiasu commented 6 days ago

I trained a LoRA of Qwen2.5-Coder-7B-Instruct using ms-swift, and merge it with ms-swift. The Command and the output of terminal is below.

Train, merge and app-ui command swift sft \ --model_type qwen2_5-coder-7b-instruct \ --dataset data/pokemon_train.json \ --num_train_epochs 3 \ --sft_type lora \ --output_dir output \ --eval_steps 100 \ --lora_rank 8 \ --lora_alpha 32 \ --lora_dropout_p 0.05 \ --gradient_checkpointing true \ --batch_size 8 \ --weight_decay 0.01 \ --learning_rate 1e-4 \ --gradient_accumulation_steps 2 \ --max_grad_norm 0.5 \ --warmup_ratio 0.03 \ --save_steps 100 \ --save_total_limit 2 \ --logging_steps 10 \ --use_flash_attn false \ --save_only_model true \ --lazy_tokenize true swift export \ --model_type qwen2_5-coder-7b-instruct --ckpt_dir './output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90' --merge_lora true swift app-ui --model_type qwen2_5-coder-7b-instruct --ckpt_dir './output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged'
Output of the app-ui command run sh: `/home/faych/miniconda3/envs/ms-swift/bin/python /home/faych/.vscode-server/data/ms-swift/swift/cli/app_ui.py --model_type qwen2_5-coder-7b-instruct --ckpt_dir ./output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged` [INFO:swift] Successfully registered `/home/faych/.vscode-server/data/ms-swift/swift/llm/data/dataset_info.json` [INFO:swift] No vLLM installed, if you are using vLLM, you will get `ImportError: cannot import name 'get_vllm_engine' from 'swift.llm'` [INFO:swift] No LMDeploy installed, if you are using LMDeploy, you will get `ImportError: cannot import name 'prepare_lmdeploy_engine_template' from 'swift.llm'` [INFO:swift] Start time of running main: 2024-09-25 21:18:36.113221 [INFO:swift] ckpt_dir: /home/faych/.vscode-server/data/genie/output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged [INFO:swift] Setting model_info['revision']: master [INFO:swift] Setting args.eval_human: True [INFO:swift] args: AppUIArguments(model_type='qwen2_5-coder-7b-instruct', model_id_or_path='qwen/Qwen2.5-Coder-7B-Instruct', model_revision='master', sft_type='full', template_type='qwen2_5', infer_backend='pt', ckpt_dir='/home/faych/.vscode-server/data/genie/output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged', result_dir=None, load_args_from_ckpt_dir=True, load_dataset_config=False, eval_human=True, seed=42, dtype='bf16', model_kwargs={}, dataset=[], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=-1, save_result=True, system='You are Qwen, created by Alibaba Cloud. You are a helpful assistant.', tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=None, temperature=None, top_k=None, top_p=None, repetition_penalty=None, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=False, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=False, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=None, disable_custom_all_reduce=True, enforce_eager=False, limit_mm_per_prompt=None, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], max_logprobs=20, tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None, device_map_config_path=None, host='127.0.0.1', port=7860, share=False, server_name=None, server_port=None) [INFO:swift] Global seed set to 42 [INFO:swift] device_count: 1 [INFO:swift] Loading the model using model_dir: /home/faych/.vscode-server/data/genie/output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged [INFO:swift] model_kwargs: {'device_map': 'cuda:0'} Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████| 4/4 [02:38<00:00, 39.65s/it] [INFO:swift] model.max_model_len: 32768 [INFO:swift] model_config: Qwen2Config { "_name_or_path": "/home/faych/.vscode-server/data/genie/output/qwen2_5-coder-7b-instruct/v2-20240923-153459/checkpoint-90-merged", "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "bos_token_id": 151643, "eos_token_id": 151643, "hidden_act": "silu", "hidden_size": 3584, "initializer_range": 0.02, "intermediate_size": 18944, "max_position_embeddings": 32768, "max_window_layers": 28, "model_type": "qwen2", "num_attention_heads": 28, "num_hidden_layers": 28, "num_key_value_heads": 4, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sliding_window": null, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.44.2", "use_cache": true, "use_sliding_window": false, "vocab_size": 152064 } [INFO:swift] model.generation_config: GenerationConfig { "bos_token_id": 151643, "do_sample": true, "eos_token_id": 151645, "max_new_tokens": 2048, "pad_token_id": 151643, "repetition_penalty": 1.1, "temperature": 0.7, "top_k": 20, "top_p": 0.8 } [INFO:swift] [model.embed_tokens.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.q_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.q_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.k_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.k_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.v_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.v_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.self_attn.o_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.mlp.gate_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.mlp.up_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.mlp.down_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.input_layernorm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.0.post_attention_layernorm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.q_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.q_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.k_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.k_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.v_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.v_proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.layers.1.self_attn.o_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] ... [INFO:swift] Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(152064, 3584) (layers): ModuleList( (0-27): 28 x Qwen2DecoderLayer( (self_attn): Qwen2SdpaAttention( (q_proj): Linear(in_features=3584, out_features=3584, bias=True) (k_proj): Linear(in_features=3584, out_features=512, bias=True) (v_proj): Linear(in_features=3584, out_features=512, bias=True) (o_proj): Linear(in_features=3584, out_features=3584, bias=False) (rotary_emb): Qwen2RotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=3584, out_features=18944, bias=False) (up_proj): Linear(in_features=3584, out_features=18944, bias=False) (down_proj): Linear(in_features=18944, out_features=3584, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) (post_attention_layernorm): Qwen2RMSNorm((3584,), eps=1e-06) ) ) (norm): Qwen2RMSNorm((3584,), eps=1e-06) ) (lm_head): Linear(in_features=3584, out_features=152064, bias=False) ) [INFO:swift] Qwen2ForCausalLM: 7615.6165M Params (0.0000M Trainable [0.0000%]), 234.8828M Buffers. [INFO:swift] system: You are Qwen, created by Alibaba Cloud. You are a helpful assistant. Running on local URL: http://127.0.0.1:7860
Jintao-Huang commented 5 days ago

The model structure of qwen2 and qwen2.5 is the same.

neverbiasu commented 5 days ago

Thx for explain it.