About OK-VQA dataset - Githubissues

yaohui120 commented 4 months ago

The paper ‘Can We Edit Multimodal Large Language Models’ said that the accuracies of base model(blip2, minigpt4) on OK-VQA are all 100. And I'm a little confused. Does the pretrained model have such strong capabilities? I test again, but the accuracies are less than 5%. The main code is

acc, pred_ids = compute_multimodal_edit_quality(pretrained_model, batch_data)

Is there a problem with my testing code?

tbozhong commented 4 months ago

My apologies if there has been any misunderstanding. I believe there may have been a misinterpretation. We did not state that

The paper ‘Can We Edit Multimodal Large Language Models’ said that the accuracies of base model(blip2, minigpt4) on OK-VQA are all 100.

Could it be that you are referring to the locality of the base models, which is indeed 100? The term locality is evaluated based on the rate at which the post-edit model’s predictions remain unchanged compared to the pre-edit model. For further details, I would point you to the discussion on 'locality' presented in this linked issue #88 : here.

yaohui120 commented 4 months ago

Thank you, this solves my confusion.

zjunlp / EasyEdit

About OK-VQA dataset #243