open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.32k stars 188 forks source link

评测InternVL2-1B报错: got multiple values for keyword argument 'return_dict' #591

Open qingchen177 opened 2 days ago

qingchen177 commented 2 days ago

尝试评测InternVL2-1B时报错

这是评测运行的命令

python run.py --data MathVision_MINI --model InternVL2-1B 换成 2B的its work,可能是因为1B的基座是qwen2导致的问题 python run.py --data MathVision_MINI --model InternVL2-2B

这是报错信息:

[2024-11-11 15:00:51] ERROR - run.py: main - 284: Model InternVL2-1B x Dataset MathVision_MINI combination failed: Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151655, 896)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2FlashAttention2(
          (q_proj): Linear(in_features=896, out_features=896, bias=True)
          (k_proj): Linear(in_features=896, out_features=128, bias=True)
          (v_proj): Linear(in_features=896, out_features=128, bias=True)
          (o_proj): Linear(in_features=896, out_features=896, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((896,), eps=1e-06)
    (rotary_emb): Qwen2RotaryEmbedding()
  )
  (lm_head): Linear(in_features=896, out_features=151655, bias=False)
) got multiple values for keyword argument 'return_dict', skipping this combination.
Traceback (most recent call last):
  File "/home/li/work/projects/githubProjects/VLMEvalKit/run.py", line 184, in main
    model = infer_data_job(
  File "/home/li/work/projects/githubProjects/VLMEvalKit/vlmeval/inference.py", line 164, in infer_data_job
    model = infer_data(
  File "/home/li/work/projects/githubProjects/VLMEvalKit/vlmeval/inference.py", line 129, in infer_data
    response = model.generate(message=struct, dataset=dataset_name)
  File "/home/li/work/projects/githubProjects/VLMEvalKit/vlmeval/vlm/base.py", line 115, in generate
    return self.generate_inner(message, dataset)
  File "/home/li/work/projects/githubProjects/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 459, in generate_inner
    return self.generate_v2(message, dataset)
  File "/home/li/work/projects/githubProjects/VLMEvalKit/vlmeval/vlm/internvl_chat.py", line 430, in generate_v2
    response = self.model.chat(
  File "/home/li/.cache/huggingface/modules/transformers_modules/InternVL2-1B/modeling_internvl_chat.py", line 289, in chat
    generation_output = self.generate(
  File "/home/li/anaconda3/envs/eval/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/li/.cache/huggingface/modules/transformers_modules/InternVL2-1B/modeling_internvl_chat.py", line 339, in generate
    outputs = self.language_model.generate(
  File "/home/li/anaconda3/envs/eval/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/home/li/anaconda3/envs/eval/lib/python3.10/site-packages/transformers/generation/utils.py", line 2215, in generate
    result = self._sample(
  File "/home/li/anaconda3/envs/eval/lib/python3.10/site-packages/transformers/generation/utils.py", line 3206, in _sample
    outputs = self(**model_inputs, return_dict=True)
TypeError: Qwen2ForCausalLM(
  (model): Qwen2Model(
    (embed_tokens): Embedding(151655, 896)
    (layers): ModuleList(
      (0-23): 24 x Qwen2DecoderLayer(
        (self_attn): Qwen2FlashAttention2(
          (q_proj): Linear(in_features=896, out_features=896, bias=True)
          (k_proj): Linear(in_features=896, out_features=128, bias=True)
          (v_proj): Linear(in_features=896, out_features=128, bias=True)
          (o_proj): Linear(in_features=896, out_features=896, bias=False)
          (rotary_emb): Qwen2RotaryEmbedding()
        )
        (mlp): Qwen2MLP(
          (gate_proj): Linear(in_features=896, out_features=4864, bias=False)
          (up_proj): Linear(in_features=896, out_features=4864, bias=False)
          (down_proj): Linear(in_features=4864, out_features=896, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
        (post_attention_layernorm): Qwen2RMSNorm((896,), eps=1e-06)
      )
    )
    (norm): Qwen2RMSNorm((896,), eps=1e-06)
    (rotary_emb): Qwen2RotaryEmbedding()
  )
  (lm_head): Linear(in_features=896, out_features=151655, bias=False)
) got multiple values for keyword argument 'return_dict'

可能与Qwen2ForCausalLM这个有关?

如何复现

新建一个demo.py ,运行如下代码即可复现

from vlmeval.config import supported_VLM
model = supported_VLM['InternVL2-1B']()
# 前向单张图片
ret = model.generate(['assets/apple.jpg', 'What is in this image?'])
print(ret)  # 这张图片上有一个带叶子的红苹果
# 前向多张图片
ret = model.generate(['assets/apple.jpg', 'assets/apple.jpg', 'How many apples are there in the provided images? '])
print(ret)  # 提供的图片中有两个苹果
qingchen177 commented 2 days ago

顺便想问下,评测结果的含义,是如何评测?或者依据是什么?(不太懂)就比如下面这张图各个含义是如何? 这几个字段的意义:"prefetch","hit","prefetch_rate","acc" image

qingchen177 commented 2 days ago

image 我在.env文件中配置如下:

OPENAI_API_KEY=sk-123456
OPENAI_API_BASE=http://127.0.0.1:8000/v1/chat/completions
LOCAL_LLM=qwen2_5-7b-instruct

然后观察到表格中res的回复会出现如下的回复: You are Qwen, created by Alibaba Cloud. You are a helpful assistant. 这是模型本身的原因还是要调整提示词

czczup commented 2 days ago

估计是因为你用的transformers版本太新了