Hi, I'm trying to reproduce the VQA performance in Table 17 (VQAv2, SQA, POPE, MMB). However, my reproduced performance is quite low, e.g. 43% on ScienceQA compared to the reported 64.9%. I'm using the original LLaVA evaluation scripts and I changed the forward/generate functions in PSALM class to only generate texts (mostly following LLaVA):
Hi, I'm trying to reproduce the VQA performance in Table 17 (VQAv2, SQA, POPE, MMB). However, my reproduced performance is quite low, e.g. 43% on ScienceQA compared to the reported 64.9%. I'm using the original LLaVA evaluation scripts and I changed the forward/generate functions in
PSALM
class to only generate texts (mostly following LLaVA):Do you think this is the correct implementation or can you share your evaluation scripts for these tasks? Thanks a lot!