Closed yanyanyufei1 closed 3 months ago
试试升级一下ms-swift
ms-swift的版本是2.3.0.dev0 还试了2.2.5之类的,transformer的版本也换过几个了 还是有同样的问题
你试试重新拉取一下internvl-1b的repo代码 或许可以解决.
我这里没有问题. ms-swift==2.2.5
解决了吗
稍等 手头有点别的事 今天下午能看这个
我重新下载了internvl2-1b 下载方式如下 还是不行 from modelscope import snapshot_download model_dir = snapshot_download('OpenGVLab/InternVL2-1B')
同样的环境internvl2-2b可以
I encountered the same problem, have you solved it?
The version of Transformers is?
The version of Transformers is? transformers 4.37.2 Exactly the same error
I tested transformers==4.44.* here, and it works fine.
The version of Transformers is? transformers 4.37.2 Exactly the same error
Can reproduce the situation.
I tested transformers==4.44.* here, and it works fine.
thanks, it works
Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程,最好有截图)
Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息,如CUDA版本,系统,GPU型号和torch版本等) GPU:Tesla V100 Ubuntu 18.04.1 LTS CUDA Version: 11.7 torch Version:2.3.0+cu118
Additional context Add any other context about the problem here(在这里补充其他信息) (internvl2) q00813667@g5500-v100-14:~/code/internvl-1b$ CUDA_VISIBLE_DEVICES=6 swift infer --model_type internvl2-1b --model_id_or_path /home/qyf/code/internvl-1b/weight/ run sh:
python /home/qyf/code/internvl-1b/swift/swift/cli/infer.py --model_type internvl2-1b --model_id_or_path /home/qyf/code/internvl-1b/weight/
[INFO:swift] Successfully registered/home/qyf/code/internvl-1b/swift/swift/llm/data/dataset_info.json
[INFO:swift] Start time of running main: 2024-08-09 09:50:26.600485 [INFO:swift] ckpt_dir: None [INFO:swift] Due tockpt_dir
beingNone
,load_args_from_ckpt_dir
is set toFalse
. [INFO:swift] Setting template_type: internvl2 [INFO:swift] Setting self.eval_human: True [INFO:swift] Setting overwrite_generation_config: False [INFO:swift] args: InferArguments(model_type='internvl2-1b', model_id_or_path='/home/qyf/code/internvl-1b/weight', model_revision='master', sft_type='full', template_type='internvl2', infer_backend='pt', ckpt_dir=None, result_dir=None, load_args_from_ckpt_dir=False, load_dataset_config=False, eval_human=True, seed=42, dtype='AUTO', dataset=[], val_dataset=[], dataset_seed=42, dataset_test_ratio=0.01, show_dataset_sample=10, save_result=True, system=None, tools_prompt='react_en', max_length=None, truncation_strategy='delete', check_dataset_strategy='none', model_name=[None, None], model_author=[None, None], quant_method=None, quantization_bit=0, hqq_axis=0, hqq_dynamic_config_path=None, bnb_4bit_comp_dtype='AUTO', bnb_4bit_quant_type='nf4', bnb_4bit_use_double_quant=True, bnb_4bit_quant_storage=None, max_new_tokens=2048, do_sample=True, temperature=0.3, top_k=20, top_p=0.7, repetition_penalty=1.0, num_beams=1, stop_words=[], rope_scaling=None, use_flash_attn=None, ignore_args_error=False, stream=True, merge_lora=False, merge_device_map='cpu', save_safetensors=True, overwrite_generation_config=False, verbose=None, local_repo_path=None, custom_register_path=None, custom_dataset_info=None, device_map_config_path=None, device_max_memory=[], hub_token=None, gpu_memory_utilization=0.9, tensor_parallel_size=1, max_num_seqs=256, max_model_len=None, disable_custom_all_reduce=True, enforce_eager=False, vllm_enable_lora=False, vllm_max_lora_rank=16, lora_modules=[], image_input_shape=None, image_feature_size=None, tp=1, cache_max_entry_count=0.8, quant_policy=0, vision_batch_size=1, self_cognition_sample=0, train_dataset_sample=-1, val_dataset_sample=None, safe_serialization=None, model_cache_dir=None, merge_lora_and_save=None, custom_train_dataset_path=[], custom_val_dataset_path=[], vllm_lora_modules=None) [INFO:swift] Global seed set to 42 [INFO:swift] device_count: 1 [INFO:swift] Loading the model using model_dir: /home/qyf/code/internvl-1b/weight [INFO:swift] Setting torch_dtype: torch.bfloat16 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [INFO:swift] model_kwargs: {'device_map': 'cuda:0'} FlashAttention is not installed. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. Warning: Flash Attention is not available, use_flash_attn is set to False. [INFO:swift] model.max_model_len: 32768 [INFO:swift] model_config: InternVLChatConfig { "_commit_hash": null, "_name_or_path": "/home/qyf/code/internvl-1b/weight", "architectures": [ "InternVLChatModel" ], "auto_map": { "AutoConfig": "configuration_internvl_chat.InternVLChatConfig", "AutoModel": "modeling_internvl_chat.InternVLChatModel", "AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel" }, "downsample_ratio": 0.5, "dynamic_image_size": true, "force_image_size": 448, "hidden_size": 896, "llm_config": { "_name_or_path": "Qwen/Qwen2-0.5B-Instruct", "add_cross_attention": false, "architectures": [ "Qwen2ForCausalLM" ], "attention_dropout": 0.0, "attn_implementation": "eager", "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": 151643, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": 151645, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "silu", "hidden_size": 896, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "initializer_range": 0.02, "intermediate_size": 4864, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "length_penalty": 1.0, "max_length": 20, "max_position_embeddings": 32768, "max_window_layers": 24, "min_length": 0, "model_type": "qwen2", "no_repeat_ngram_size": 0, "num_attention_heads": 14, "num_beam_groups": 1, "num_beams": 1, "num_hidden_layers": 24, "num_key_value_heads": 2, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "prefix": null, "problem_type": null, "pruned_heads": {}, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "rms_norm_eps": 1e-06, "rope_theta": 1000000.0, "sep_token_id": null, "sliding_window": 32768, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_cache": true, "use_sliding_window": false, "vocab_size": 151655 }, "max_dynamic_patch": 12, "max_position_embeddings": 32768, "min_dynamic_patch": 1, "model_type": "internvl_chat", "ps_version": "v2", "select_layer": -1, "template": "Hermes-2", "torch_dtype": "bfloat16", "transformers_version": null, "use_backbone_lora": 0, "use_llm_lora": 0, "use_thumbnail": true, "vision_config": { "_name_or_path": "", "add_cross_attention": false, "architectures": [ "InternVisionModel" ], "attention_dropout": 0.0, "bad_words_ids": null, "begin_suppress_tokens": null, "bos_token_id": null, "chunk_size_feed_forward": 0, "cross_attention_hidden_size": null, "decoder_start_token_id": null, "diversity_penalty": 0.0, "do_sample": false, "drop_path_rate": 0.0, "dropout": 0.0, "early_stopping": false, "encoder_no_repeat_ngram_size": 0, "eos_token_id": null, "exponential_decay_length_penalty": null, "finetuning_task": null, "forced_bos_token_id": null, "forced_eos_token_id": null, "hidden_act": "gelu", "hidden_size": 1024, "id2label": { "0": "LABEL_0", "1": "LABEL_1" }, "image_size": 448, "initializer_factor": 1.0, "initializer_range": 0.02, "intermediate_size": 4096, "is_decoder": false, "is_encoder_decoder": false, "label2id": { "LABEL_0": 0, "LABEL_1": 1 }, "layer_norm_eps": 1e-06, "length_penalty": 1.0, "max_length": 20, "min_length": 0, "model_type": "intern_vit_6b", "no_repeat_ngram_size": 0, "norm_type": "layer_norm", "num_attention_heads": 16, "num_beam_groups": 1, "num_beams": 1, "num_channels": 3, "num_hidden_layers": 24, "num_return_sequences": 1, "output_attentions": false, "output_hidden_states": false, "output_scores": false, "pad_token_id": null, "patch_size": 14, "prefix": null, "problem_type": null, "pruned_heads": {}, "qk_normalization": false, "qkv_bias": true, "remove_invalid_values": false, "repetition_penalty": 1.0, "return_dict": true, "return_dict_in_generate": false, "sep_token_id": null, "suppress_tokens": null, "task_specific_params": null, "temperature": 1.0, "tf_legacy_loss": false, "tie_encoder_decoder": false, "tie_word_embeddings": true, "tokenizer_class": null, "top_k": 50, "top_p": 1.0, "torch_dtype": "bfloat16", "torchscript": false, "transformers_version": "4.37.2", "typical_p": 1.0, "use_bfloat16": true, "use_flash_attn": true } }[INFO:swift] generation_config: GenerationConfig { "do_sample": true, "eos_token_id": 151645, "max_new_tokens": 2048, "pad_token_id": 151643, "temperature": 0.3, "top_k": 20, "top_p": 0.7 }
[INFO:swift] [vision_model.embeddings.class_embedding]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.embeddings.position_embedding]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.embeddings.patch_embedding.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.embeddings.patch_embedding.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.ls1]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.ls2]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.attn.qkv.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.attn.qkv.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.attn.proj.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.attn.proj.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.mlp.fc1.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.mlp.fc1.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.mlp.fc2.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.mlp.fc2.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.norm1.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.norm1.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.norm2.weight]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.0.norm2.bias]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.1.ls1]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] [vision_model.encoder.layers.1.ls2]: requires_grad=False, dtype=torch.bfloat16, device=cuda:0 [INFO:swift] ... [INFO:swift] InternVLChatModel( (vision_model): InternVisionModel( (embeddings): InternVisionEmbeddings( (patch_embedding): Conv2d(3, 1024, kernel_size=(14, 14), stride=(14, 14)) ) (encoder): InternVisionEncoder( (layers): ModuleList( (0-23): 24 x InternVisionEncoderLayer( (attn): InternAttention( (qkv): Linear(in_features=1024, out_features=3072, bias=True) (attn_drop): Dropout(p=0.0, inplace=False) (proj_drop): Dropout(p=0.0, inplace=False) (proj): Linear(in_features=1024, out_features=1024, bias=True) ) (mlp): InternMLP( (act): GELUActivation() (fc1): Linear(in_features=1024, out_features=4096, bias=True) (fc2): Linear(in_features=4096, out_features=1024, bias=True) ) (norm1): LayerNorm((1024,), eps=1e-06, elementwise_affine=True) (norm2): LayerNorm((1024,), eps=1e-06, elementwise_affine=True) (drop_path1): Identity() (drop_path2): Identity() ) ) ) ) (language_model): Qwen2ForCausalLM( (model): Qwen2Model( (embed_tokens): Embedding(151655, 896) (layers): ModuleList( (0-23): 24 x Qwen2DecoderLayer( (self_attn): Qwen2SdpaAttention( (q_proj): Linear(in_features=896, out_features=896, bias=True) (k_proj): Linear(in_features=896, out_features=128, bias=True) (v_proj): Linear(in_features=896, out_features=128, bias=True) (o_proj): Linear(in_features=896, out_features=896, bias=False) (rotary_emb): Qwen2RotaryEmbedding() ) (mlp): Qwen2MLP( (gate_proj): Linear(in_features=896, out_features=4864, bias=False) (up_proj): Linear(in_features=896, out_features=4864, bias=False) (down_proj): Linear(in_features=4864, out_features=896, bias=False) (act_fn): SiLU() ) (input_layernorm): Qwen2RMSNorm() (post_attention_layernorm): Qwen2RMSNorm() ) ) (norm): Qwen2RMSNorm() ) (lm_head): Linear(in_features=896, out_features=151655, bias=False) ) (mlp1): Sequential( (0): LayerNorm((4096,), eps=1e-05, elementwise_affine=True) (1): Linear(in_features=4096, out_features=896, bias=True) (2): GELU(approximate='none') (3): Linear(in_features=896, out_features=896, bias=True) ) ) [INFO:swift] InternVLChatModel: 938.1590M Params (0.0000M Trainable [0.0000%]), 100.6641M Buffers. [INFO:swift] system: 你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。 [INFO:swift] Inputdescribe the image
Input an image path or URL <<< http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/cat.png
Exception in thread Thread-1 (generate):
Traceback (most recent call last):
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, self._kwargs)
File "/home/qyf/code/internvl-1b/swift/swift/llm/utils/model.py", line 4182, in _new_generate
return generate(*args, *kwargs)
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(args, kwargs)
File "/home/qyf/.cache/huggingface/modules/transformers_modules/weight/modeling_internvl_chat.py", line 334, in generate
outputs = self.language_model.generate(
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/site-packages/transformers/generation/utils.py", line 1334, in generate
inputs_tensor, model_input_name, model_kwargs = self._prepare_model_inputs(
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/site-packages/transformers/generation/utils.py", line 402, in _prepare_model_inputs
model_kwargs["input_ids"] = self._maybe_initialize_input_ids_for_generation(
File "/home/qyf/software/miniconda3/envs/internvl2/lib/python3.10/site-packages/transformers/generation/utils.py", line 435, in _maybe_initialize_input_ids_for_generation
raise ValueError("
exit
orquit
to exit the conversation. [INFO:swift] Inputmulti-line
to switch to multi-line input mode. [INFO:swift] Inputreset-system
to reset the system and clear the history. [INFO:swift] Inputclear
to clear the history. [INFO:swift] Please enter the conversation content first, followed by the path to the multimedia file. <<<bos_token_id
has to be defined when noinput_ids
are provided.") ValueError:bos_token_id
has to be defined when noinput_ids
are provided.