modelscope / ms-swift

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
https://swift.readthedocs.io/zh-cn/latest/Instruction/index.html
Apache License 2.0
4.48k stars 394 forks source link

infer 无法跑完所有data #1006

Closed AlexJJJChen closed 4 months ago

AlexJJJChen commented 6 months ago

CUDA_VISIBLE_DEVICES=0 \ NPROC_PER_NODE=1 \ nproc_per_node=1 \ swift infer \ --ckpt_dir "output_llava/llava1d6-mistral-7b-instruct/v32-20240524-165418/checkpoint-2003" \ --custom_val_dataset_path finetune_dataset/test.json \ --repetition_penalty 1. \ --merge_lora false \ --show_dataset_sample "-1" \

我的test.json有5万条,但是跑这个代码时只显示有1千条 2.0.5版本无法完成infer所有data,2.0.4的可以。麻烦修改一下bug

AlexJJJChen commented 6 months ago

2.0.5版本无法完成infer所有data,2.0.4的可以。麻烦修改一下bug

Jintao-Huang commented 6 months ago

可以放一些日志不

Jintao-Huang commented 6 months ago

你可以确定一下是2.0.5的版本不

AlexJJJChen commented 6 months ago

你可以确定一下是2.0.5的版本不

是2.0.5有问题,我重新安装了2.0.4才能跑完整的数据集。 跑infer的时候,本来我的数据集有5w条,但是 linux 显示val dataset只有1.5k 条,进度条也只有1.5k

Jintao-Huang commented 6 months ago

感觉2.0.5不应该出现这个问题呀,你确认一下不是2.1.0.dev吗

AlexJJJChen commented 6 months ago

感觉2.0.5不应该出现这个问题呀,你确认一下不是2.1.0.dev吗

🤔 但我pip show的时候显示是2.0.5呀。不过是最新版就对了最近几天才更新的package,更新之后就出现了这个bug

Jintao-Huang commented 6 months ago

推理的时候 会打印args: xxx, 这里可以看看嘛

Jintao-Huang commented 6 months ago

我这里没有复现

AlexJJJChen commented 6 months ago

我这里没有复现

image

我的数据集有5w行,但这里显示1k行

AlexJJJChen commented 6 months ago

[INFO:swift] model.max_model_len: 8192 [INFO:swift] model_config: MiniCPMVConfig { "_name_or_path": "/root/.cache/modelscope/hub/OpenBMB/MiniCPM-Llama3-V-2_5", "architectures": [ "MiniCPMV" ], "attention_bias": false, "attention_dropout": 0.0, "auto_map": { "AutoConfig": "configuration_minicpm.MiniCPMVConfig", "AutoModel": "modeling_minicpmv.MiniCPMV", "AutoModelForCausalLM": "modeling_minicpmv.MiniCPMV" }, "batch_vision_input": true, "bos_token_id": 128000, "drop_vision_last_layer": false, "eos_token_id": 128001, "hidden_act": "silu", "hidden_size": 4096, "image_size": 448, "initializer_range": 0.02, "intermediate_size": 14336, "max_position_embeddings": 8192, "mlp_bias": false, "mm_use_im_start_end": true, "model_type": "minicpmv", "num_attention_heads": 32, "num_hidden_layers": 32, "num_key_value_heads": 8, "patch_size": 14, "pretraining_tp": 1, "query_num": 96, "rms_norm_eps": 1e-05, "rope_scaling": null, "rope_theta": 500000.0, "slice_config": { "max_slice_nums": 9, "model_type": "minicpmv" }, "slice_mode": true, "tie_word_embeddings": false, "torch_dtype": "float16", "transformers_version": "4.41.1", "use_cache": true, "vision_config": { "hidden_size": 1152, "image_size": 980, "intermediate_size": 4304, "model_type": "idefics2", "num_attention_heads": 16, "num_hidden_layers": 27, "patch_size": 14 }, "vocab_size": 128256 }

[INFO:swift] generation_config: GenerationConfig { "do_sample": true, "eos_token_id": 128001, "max_new_tokens": 2048, "pad_token_id": 128002, "temperature": 0.3, "top_k": 20, "top_p": 0.7 }

[INFO:swift] [llm.model.embed_tokens.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.self_attn.q_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.self_attn.k_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.self_attn.v_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.self_attn.o_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.mlp.gate_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.mlp.up_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.mlp.down_proj.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] [llm.model.layers.0.input_layernorm.weight]: requires_grad=True, dtype=torch.float16, device=cuda:0 [INFO:swift] ... [INFO:swift] MiniCPMV( (llm): LlamaForCausalLM( (model): LlamaModel( (embed_tokens): Embedding(128256, 4096) (layers): ModuleList( (0-31): 32 x LlamaDecoderLayer( (self_attn): LlamaSdpaAttention( (q_proj): Linear(in_features=4096, out_features=4096, bias=False) (k_proj): Linear(in_features=4096, out_features=1024, bias=False) (v_proj): Linear(in_features=4096, out_features=1024, bias=False) (o_proj): Linear(in_features=4096, out_features=4096, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=4096, out_features=14336, bias=False) (up_proj): Linear(in_features=4096, out_features=14336, bias=False) (down_proj): Linear(in_features=14336, out_features=4096, bias=False) (act_fn): SiLU() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) ) (norm): LlamaRMSNorm() ) (lm_head): Linear(in_features=4096, out_features=128256, bias=False) ) (vpm): Idefics2VisionTransformer( (embeddings): Idefics2VisionEmbeddings( (patch_embedding): Conv2d(3, 1152, kernel_size=(14, 14), stride=(14, 14), padding=valid) (position_embedding): Embedding(4900, 1152) ) (encoder): Idefics2Encoder( (layers): ModuleList( (0-26): 27 x Idefics2EncoderLayer( (self_attn): Idefics2VisionAttention( (k_proj): Linear(in_features=1152, out_features=1152, bias=True) (v_proj): Linear(in_features=1152, out_features=1152, bias=True) (q_proj): Linear(in_features=1152, out_features=1152, bias=True) (out_proj): Linear(in_features=1152, out_features=1152, bias=True) ) (layer_norm1): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) (mlp): Idefics2VisionMLP( (activation_fn): PytorchGELUTanh() (fc1): Linear(in_features=1152, out_features=4304, bias=True) (fc2): Linear(in_features=4304, out_features=1152, bias=True) ) (layer_norm2): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) ) ) ) (post_layernorm): LayerNorm((1152,), eps=1e-06, elementwise_affine=True) ) (resampler): Resampler( (kv_proj): Linear(in_features=1152, out_features=4096, bias=False) (attn): MultiheadAttention( (out_proj): NonDynamicallyQuantizableLinear(in_features=4096, out_features=4096, bias=True) ) (ln_q): LayerNorm((4096,), eps=1e-06, elementwise_affine=True) (ln_kv): LayerNorm((4096,), eps=1e-06, elementwise_affine=True) (ln_post): LayerNorm((4096,), eps=1e-06, elementwise_affine=True) ) ) [INFO:swift] MiniCPMV: 8537.0923M Params (8537.0923M Trainable [100.0000%]), 20.0724M Buffers. [INFO:swift] system: None [INFO:swift] val_dataset: Dataset({ features: ['query', 'response', 'images'], num_rows: 1016

Jintao-Huang commented 6 months ago

args可以看看不 命令后中打印的

Jintao-Huang commented 6 months ago

要不群里沟通这个 群里回复及时一点

maokangkun commented 5 months ago

2.1.0版本也有这个问题

Jintao-Huang commented 5 months ago

2.1.0版本也有这个问题

有截图么,不应该啊