modelscope / ms-swift

Use PEFT or Full-parameter to finetune 300+ LLMs or 80+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V-2.6, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
3.41k stars 292 forks source link

An error occurred when trying to fine-tune yi-34b-200k with damo-agent dataset AssertionError: not support `system` #209

Closed xiong0827 closed 8 months ago

xiong0827 commented 9 months ago

2023-12-09 11:24:04,075 - modelscope - INFO - PyTorch version 2.1.1 Found. 2023-12-09 11:24:04,076 - modelscope - INFO - Loading ast index from /home/public/modelscope/ast_indexer 2023-12-09 11:24:04,267 - modelscope - INFO - Loading done! Current index file version is 1.9.5, with md5 0adf9929420bfb043e99d81cfd617d40 and a total number of 945 components indexed ----- True False ----- [INFO:swift] output_dir: /home/public/project/swift/output/yi-34b-200k/v0-20231209-112406 [INFO:swift] Setting template_type: default-generation [INFO:swift] Setting hub_model_id: yi-34b-200k-lora [INFO:swift] args: SftArguments(model_type='yi-34b-200k', model_id_or_path='01ai/Yi-34B-200K', model_revision='master', model_cache_dir=None, sft_type='lora', tuner_backend='swift', template_type='default-generation', output_dir='/home/public/project/swift/output/yi-34b-200k/v0-20231209-112406', add_output_dir_suffix=True, ddp_backend='nccl', seed=42, resume_from_checkpoint=None, dtype='bf16', dataset=['damo-agent-mini-zh'], dataset_seed=42, dataset_test_ratio=0.01, train_dataset_sample=-1, val_dataset_sample=None, system=None, max_length=2048, truncation_strategy='delete', check_dataset_strategy='none', custom_train_dataset_path=[], custom_val_dataset_path=[], self_cognition_sample=0, model_name=None, model_author=None, quantization_bit=0, bnb_4bit_comp_dtype='bf16', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, lora_target_modules=['q_proj', 'k_proj', 'v_proj'], lora_rank=8, lora_alpha=32, lora_dropout_p=0.05, neftune_alpha=0.0, gradient_checkpointing=True, deepspeed_config_path=None, batch_size=1, eval_batch_size=1, num_train_epochs=1, max_steps=-1, optim='adamw_torch', learning_rate=0.0001, weight_decay=0.01, gradient_accumulation_steps=16, max_grad_norm=0.5, predict_with_generate=False, lr_scheduler_type='cosine', warmup_ratio=0.05, eval_steps=100, save_steps=100, only_save_model=False, save_total_limit=2, logging_steps=5, dataloader_num_workers=1, push_to_hub=False, hub_model_id='yi-34b-200k-lora', hub_private_repo=True, push_hub_strategy='push_best', hub_token=None, test_oom_error=False, use_flash_attn=None, ignore_args_error=False, logging_dir='/home/public/project/swift/output/yi-34b-200k/v0-20231209-112406/runs', report_to=['all'], check_model_is_latest=True, save_on_each_node=True, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.05) device_count: 4 rank: -1, local_rank: -1, world_size: 1, local_world_size: 1 [INFO:swift] Global seed set to 42 Loading checkpoint shards: 100%|██████████████████████████████████████████████| 7/7 [01:39<00:00, 14.15s/it] [INFO:swift] model_config: LlamaConfig { "_name_or_path": "/home/public/modelscope/01ai/Yi-34B-200K", "architectures": [ "LlamaForCausalLM" ], "attention_bias": false, "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 7168, "initializer_range": 0.02, "intermediate_size": 20480, "max_position_embeddings": 200000, "model_type": "llama", "num_attention_heads": 56, "num_hidden_layers": 60, "num_key_value_heads": 8, "pad_token_id": 0, "pretraining_tp": 1, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 5000000.0, "tie_word_embeddings": false, "torch_dtype": "bfloat16", "transformers_version": "4.35.2", "use_cache": true, "vocab_size": 64000 }

[INFO:swift] lora_config: LoRAConfig(swift_type='LORA', peft_type=None, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=None, inference_mode=False, r=8, target_modules=['q_proj', 'k_proj', 'v_proj'], lora_alpha=32, lora_dropout=0.05, fan_in_fan_out=False, bias='none', modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, use_qa_lora=False, use_merged_linear=False, enable_lora=None) [INFO:swift] [model.model.embed_tokens.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.q_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.q_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.q_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.k_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.k_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.k_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.v_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.v_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.v_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.0.self_attn.o_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.mlp.gate_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.mlp.up_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.mlp.down_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.input_layernorm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.0.post_attention_layernorm.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.1.self_attn.q_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [model.model.layers.1.self_attn.q_proj.lora_A.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.1.self_attn.q_proj.lora_B.default.weight]: requires_grad=True, dtype=torch.float32, device=cuda:0 [INFO:swift] [model.model.layers.1.self_attn.k_proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] ... [INFO:swift] SwiftModel: 34403.6628M Params (14.7456M Trainable [0.0429%]), 3072.0038M Buffers. [INFO:swift] SwiftModel( (model): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(64000, 7168, padding_idx=0) (layers): ModuleList( (0-59): 60 x LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear( in_features=7168, out_features=7168, bias=False (lora_dropout): ModuleDict( (default): Dropout(p=0.05, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=7168, out_features=8, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=8, out_features=7168, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (k_proj): Linear( in_features=7168, out_features=1024, bias=False (lora_dropout): ModuleDict( (default): Dropout(p=0.05, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=7168, out_features=8, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=8, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (v_proj): Linear( in_features=7168, out_features=1024, bias=False (lora_dropout): ModuleDict( (default): Dropout(p=0.05, inplace=False) ) (lora_A): ModuleDict( (default): Linear(in_features=7168, out_features=8, bias=False) ) (lora_B): ModuleDict( (default): Linear(in_features=8, out_features=1024, bias=False) ) (lora_embedding_A): ParameterDict() (lora_embedding_B): ParameterDict() ) (o_proj): Linear(in_features=7168, out_features=7168, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=7168, out_features=20480, bias=False) (up_proj): Linear(in_features=7168, out_features=20480, bias=False) (down_proj): Linear(in_features=20480, out_features=7168, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=7168, out_features=64000, bias=False) ) ) [WARNING:modelscope] Reusing dataset ms_agent-bench (/root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files) [INFO:modelscope] Generating dataset ms_agent-bench (/root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files) [INFO:modelscope] Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files/2187d5baeecdec0dfeda52179b3452d0 Downloading data files: 0it [00:00, ?it/s] Extracting data files: 0it [00:00, ?it/s] 100%|█████████████████████████████████████████████████████████████| 598185/598185 [01:05<00:00, 9199.39it/s] [WARNING:modelscope] Reusing dataset ms_agent-bench (/root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files) [INFO:modelscope] Generating dataset ms_agent-bench (/root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files) [INFO:modelscope] Reusing cached meta-data file: /root/.cache/modelscope/hub/datasets/damo/MSAgent-Bench/master/data_files/161ef6906d1608a25c277263119c0beb Downloading data files: 0it [00:00, ?it/s] Extracting data files: 0it [00:00, ?it/s] 100%|███████████████████████████████████████████████████████████████████| 360/360 [00:00<00:00, 3345.06it/s] [INFO:swift] train_dataset: Dataset({ features: ['system', 'history', 'query', 'response'], num_rows: 39964 }) [INFO:swift] val_dataset: Dataset({ features: ['system', 'history', 'query', 'response'], num_rows: 152 }) [INFO:swift] system: None 0%| | 0/39964 [00:00<?, ?it/s] Traceback (most recent call last): File "/home/public/project/swift/tarin.py", line 28, in result = sft_main(sft_args,) File "/home/public/project/swift/swift/utils/run_utils.py", line 27, in x_main return llm_x(args, **kwargs) File "/home/public/project/swift/swift/llm/sft.py", line 160, in llm_sft train_dataset = dataset_map(train_dataset, template.encode) File "/home/public/project/swift/swift/llm/utils/utils.py", line 188, in dataset_map d = map_func(d) File "/home/public/project/swift/swift/llm/utils/template.py", line 316, in encode assert self.prefix_has_system is not None, 'not support system' AssertionError: not support system

Jintao-Huang commented 9 months ago

指定一下 template哈, 因为这个200k模型是base模型

Jintao-Huang commented 9 months ago

如果你使用全参数微调, 可以使用 --template yi, 如果使用lora微调, 建议使用--template default. 因为有几个special token可能没有见过, 你可以都尝试一下

xiong0827 commented 9 months ago

指定一下 template哈, 因为这个200k模型是base模型

好的,非常感谢!