Closed HugoLaurencon closed 1 month ago
Thanks, we will re-evaluate and update the results of Idefics2 on MathVista.
BTW, a community contributor is also trying to add support for Idefics3, do you have time to take a look (on sth like evalset-specific prompts)?
Actually I haven't tested on Idefics2, but only Idefics2-large that we are going to release soon (maybe this week)
I think there's not much to change. The only thing is that the PR in Transformers is not merged yet. Apart from that, the custom prompts for MMMU and MathVista remains valid, and the other one for MMStar looks good too.
There are some (hopefully small) discrepancies between generating with our internal repo and Transformers integration. If the scores differ too much from what we have reported, don't hesitate to ping me so I can have a look!
@HugoLaurencon
Unfortunately, I find this modification does not work for Idefics2-8B.
Its original rating on MathVista was 52.2, after the update, it becomes 51.4. You can have a double check by running
torchrun --nproc-per-node=$GPU run.py --model idefics2_8b --data MathVista_MINI
with VLMEvalKit on your side.
Okay thanks for the evaluation!
Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?
I will try to investigate a bit more
Okay thanks for the evaluation!
Maybe it's because recently the integration of Idefics2 was broken with the recent versions of Transformers, could you tell me your version?
I will try to investigate a bit more
The results is obtained with transformers=4.44.0, torch=2.0.1+cu118
Thanks I'll have a look when I find time! Also, if you still have the details of MMMU evaluation scores for Idefics2 for all the categories in your cache, would it be possible to copy paste the whole output of VLMEvalKit here, to compare with what I have with slightly different prompts? If it's not in your cache no worries no need to spend time recomputing, I'll do it!
Hi, @HugoLaurencon We have created a huggingface dataset named OpenVLMRecords: https://huggingface.co/datasets/VLMEval/OpenVLMRecords. You can find the records in this repo.
Very nice feature!
A very small change in the prompting of MathVista. This can change a bit the performance (up to 1 point).
Idefics2 was fine-tuned with a specific prompt for MCQ. In this PR, I add a sentence that was always seen during the fine-tuning when the model is expected to answer with a letter for MCQ.
Feel free to directly merge if you think this modification makes sense.