open-compass / VLMEvalKit

Open-source evaluation toolkit of large vision-language models (LVLMs), support 160+ VLMs, 50+ benchmarks
https://huggingface.co/spaces/opencompass/open_vlm_leaderboard
Apache License 2.0
1.34k stars 188 forks source link

DocVQA测评无法正常使用 #478

Closed M3Dade closed 1 month ago

M3Dade commented 1 month ago

当我运行

CUDA_VISIBLE_DEVICES=1 python run.py --data DocVQA_TEST --model Qwen2-VL-2B-Instruct

得到以下结果

100%|█████████████████████████████████████████████▉| 5164/5168 [2:13:48<00:06,  1.53s/it][{'role': 'user', 'content': [{'type': 'image', 'image': '/vlmeval/images/DocVQA_TEST/16380.jpg', 'min_pixels': 1003520, 'max_pixels': 12845056}, {'type': 'text', 'text': 'What is the extension number of Jo Spach ?\nPlease try to answer the question with short words or phrases if possible.'}]}]
7240.
100%|█████████████████████████████████████████████▉| 5165/5168 [2:13:50<00:04,  1.48s/it][{'role': 'user', 'content': [{'type': 'image', 'image': '/vlmeval/images/DocVQA_TEST/57341.jpg', 'min_pixels': 1003520, 'max_pixels': 12845056}, {'type': 'text', 'text': 'What is the net worth in 2012 (Rs. Cr.)?\nPlease try to answer the question with short words or phrases if possible.'}]}]
2849
100%|█████████████████████████████████████████████▉| 5166/5168 [2:13:51<00:02,  1.48s/it][{'role': 'user', 'content': [{'type': 'image', 'image': '/vlmeval/images/DocVQA_TEST/61872.jpg', 'min_pixels': 1003520, 'max_pixels': 12845056}, {'type': 'text', 'text': 'What is the cost of supplies for the 3rd year?\nPlease try to answer the question with short words or phrases if possible.'}]}]
400
100%|█████████████████████████████████████████████▉| 5167/5168 [2:13:53<00:01,  1.48s/it][{'role': 'user', 'content': [{'type': 'image', 'image': '/vlmeval/images/DocVQA_TEST/57343.jpg', 'min_pixels': 1003520, 'max_pixels': 12845056}, {'type': 'text', 'text': 'What is the dividend payout in 1996?\nPlease try to answer the question with short words or phrases if possible.'}]}]
61
100%|██████████████████████████████████████████████| 5168/5168 [2:13:54<00:00,  1.55s/it]
Traceback (most recent call last):
  File "/VLMEvalKit/run.py", line 226, in <module>
    main()
  File "/VLMEvalKit/run.py", line 208, in main
    eval_results = dataset.evaluate(result_file, **judge_kwargs)
  File "/VLMEvalKit/vlmeval/dataset/image_vqa.py", line 48, in evaluate
    assert 'answer' in data and 'prediction' in data
AssertionError

我是最近拉取的仓库代码

kennymckormick commented 1 month ago

Hi, @M3Dade ,

我们只支持 DocVQA_TEST 的推理,不支持评测 (因为没有 GroundTruth)。 你可以评测 DocVQA_VAL