Open CupidJay opened 1 year ago
I also finetuned and carefully implemented all details from the paper, but got only 76.80. Had to reduce the image size due to computational costs but still would expect a better result even at 224 px.
I would kindly ask if you could upload the finetuned model T5+ViTG somewhere.
Thank you for all your valuable contributions to the field.
I tried to reproduce the finetuning results of BLIP2 FlanT5xl on VQAv2, but the results I got are far from those in the paper. I only got the highest accuracy of 76.58% while the paper is 81.55%, I want to figure out what's wrong with my code.
I modified the forward code according to this and I also added the Instruct implementation. My yaml configuration is as follows
I really appreciate your great work and can you help me see where is the problem?