Closed erobic closed 6 years ago
I see now that the hard-coded scores make sense.
Should the final accuracy then be: score/upper_bound
? (I am running it in a different dataset and the upper bound is much lower than 100)
@erobic the answer label is so disperse that many questions whose answers with the highest frequent are <= 2 ? You can try some text preprocess tricks written in compute_softscore.py
Right, it was some rough idea I wanted to try, but am discarding it for now. Thanks for the help though!
The README file says that accuracy was calculated with the VQA metric. Is the code for that calculation in the repo? I am unsure how to use the upper bound score alongside the actual score to get to this metric.