Closed yaohui120 closed 4 months ago
My apologies if there has been any misunderstanding. I believe there may have been a misinterpretation. We did not state that
The paper ‘Can We Edit Multimodal Large Language Models’ said that the accuracies of base model(blip2, minigpt4) on OK-VQA are all 100.
Could it be that you are referring to the locality of the base models, which is indeed 100? The term locality
is evaluated based on the rate at which the post-edit model’s predictions remain unchanged compared to the pre-edit model. For further details, I would point you to the discussion on 'locality' presented in this linked issue #88 : here.
Thank you, this solves my confusion.
The paper ‘Can We Edit Multimodal Large Language Models’ said that the accuracies of base model(blip2, minigpt4) on OK-VQA are all 100. And I'm a little confused. Does the pretrained model have such strong capabilities? I test again, but the accuracies are less than 5%. The main code is
Is there a problem with my testing code?