open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support ~100 VLMs, 30+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
904 stars 119 forks source link

error when eval MiniCPM_Llama3_V2.5 on MMMU_TEST #250

Open ruifengma opened 1 month ago

ruifengma commented 1 month ago
CUDA_VISIBLE_DEVICES=2 python run.py --data MMMU_TEST --model MiniCPM-Llama3-V-2_5 --verbose
Did not detect the .env file at /data/mm/VLMEvalKit/.env, failed to load.
Did not detect the .env file at /data/mm/VLMEvalKit/.env, failed to load.
load from /home/models/MiniCPM-Llama3-V-2_5
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:03<00:00,  2.16it/s]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
  0%|                                                                                                                                                                          | 0/8260 [00:00<?, ?it/s]A
  0%|                                                                                                                                                                | 1/8260 [00:01<3:00:57,  1.31s/it]E. 120 uM/min
  0%|                                                                                                                                                                | 2/8260 [00:03<3:59:59,  1.74s/it]A. Ilioinguinal
  0%|                                                                                                                                                                | 3/8260 [00:04<3:09:04,  1.37s/it]C
  0%|                                                                                                                                                                | 4/8260 [00:04<2:23:11,  1.04s/it]C
  0%|                                                                                                                                                                | 5/8260 [00:05<2:21:08,  1.03s/it]B
  0%|                                                                                                                                                                | 6/8260 [00:06<2:06:44,  1.09it/s]A
  0%|▏                                                                                                                                                               | 7/8260 [00:07<1:57:55,  1.17it/s]D
  0%|▏                                                                                                                                                               | 8/8260 [00:07<1:47:53,  1.27it/s]D
  0%|▏                                                                                                                                                               | 9/8260 [00:08<1:24:40,  1.62it/s]Token indices sequence length is longer than the specified maximum sequence length for this model (2544 > 2048). Running this sequence through the model will result in indexing errors
  0%|▏                                                                                                                                                               | 9/8260 [00:08<2:06:54,  1.08it/s]
Traceback (most recent call last):
  File "/data/mm/VLMEvalKit/run.py", line 195, in <module>
    main()
  File "/data/mm/VLMEvalKit/run.py", line 109, in main
    model = infer_data_job(
            ^^^^^^^^^^^^^^^
  File "/data/mm/VLMEvalKit/vlmeval/inference.py", line 164, in infer_data_job
    model = infer_data(
            ^^^^^^^^^^^
  File "/data/mm/VLMEvalKit/vlmeval/inference.py", line 130, in infer_data
    response = model.generate(message=struct, dataset=dataset_name)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/mm/VLMEvalKit/vlmeval/vlm/base.py", line 140, in generate
    return self.generate_inner(message, dataset)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/mm/VLMEvalKit/vlmeval/vlm/minicpm_v.py", line 206, in generate_inner
    res = self.model.chat(
          ^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 416, in chat
    res, vision_hidden_states = self.generate(
                                ^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 305, in generate
    model_inputs = self._process_list(tokenizer, input_id_list, max_inp_length)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 200, in _process_list
    self._convert_to_tensors(tokenizer, input_ids, max_inp_length)
  File "/root/.cache/huggingface/modules/transformers_modules/MiniCPM-Llama3-V-2_5/modeling_minicpmv.py", line 180, in _convert_to_tensors
    image_bound = torch.hstack(
                  ^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 21 but got size 20 for tensor number 1 in the list.
junming-yang commented 1 month ago

The problem appears to be caused by the MiniCPM-Llama3-V-2_5 model.