Computing accuracy on close- and open-ended questions

sasaadi commented 1 year ago

Hi, thank you for providing the code for finetuning the model. To be able to reproduce your results in the paper, I would like to know how you computed the accuracy on close- and open-ended questions in VQA-RAD and Slake.

Can you confirm that for close-ended questions, you get the set of all answers from the "CLOSE" type questions in both test and train sets of each dataset and call find_most_similar_index() as in the test.py script?

And for open-ended questions, you get the set of all answers from "OPEN" type questions in both test and train sets of each dataset and call find_most_similar_index() as in the test.py script?

Thank you

M3Dade commented 9 months ago

I also have this confusion.

shunliu01 commented 7 months ago

Much curious about that, too.

xiaoman-zhang commented 7 months ago

Yes, during evaluation, for both types of questions, we collected a list of labels from the test set. Once these lists were obtained, we employed a string similarity function to compare the model's generated answers against the label list. The answer with the highest similarity score is used as the model's prediction, and used for calculating accuracy.

xiaoman-zhang / PMC-VQA

Computing accuracy on close- and open-ended questions #15