Closed danielnapierski closed 1 year ago
We now can run val
or train
and get results ourselves (val
outputs):
SPICE evaluation took: 12.42 s
SPICE: 0.122
computing METEOR score...
METEOR: 0.170
Bleu_1: 0.553
Bleu_2: 0.365
Bleu_3: 0.228
Bleu_4: 0.136
ROUGE_L: 0.355
CIDEr: 0.341
SPICE: 0.122
METEOR: 0.170
Scoring test can only be done through leaderboard. We can currently run test images and generate output. We must caption exactly 4000 test images to appear on the leaderboard.
I ran captioning on 500 VizWiz images from the validation set.
I have a PR to submit. I'll run on images from the test set today.