Closed JennDong closed 4 months ago
I test the shared checkpoints without any tuning and get the following results. I am confused to see that pretrained-base performs better than finetuned-base on Slake. Also, as I set different seeds while inference, the results don't change.
I test the shared checkpoints without any tuning and get the following results. I am confused to see that pretrained-base performs better than finetuned-base on Slake. Also, as I set different seeds while inference, the results don't change.
Exact Match Score | Finetuned-Base-Slake | Finetuned-Base-VQARAD | Pretrained-Base-Slake | Pretrained-Base-VQARAD -- | -- | -- | -- | -- Overall | 0.8576814326107446 | 0.9301601423487544 | 0.8850141376060321 | 0.29359430604982206 Open Questions | 0.8232558139534883 | 0.8914646996838778 | 0.8651162790697674 | 0.037934668071654375 Closed Questions | 0.9110576923076923 | 0.9591049382716049 | 0.9158653846153846 | 0.48033924441017734