I download the processed VCR data and achieve Q-->A performance about 72.217 with the shared checkpoint, which is slightly lower than the paper claimed. I have used my own processed data and achieve similar performance on both the fine-tuned checkpoint and the pretrained checkpoint after fine tuning on my processed data.
I download the processed VCR data and achieve Q-->A performance about 72.217 with the shared checkpoint, which is slightly lower than the paper claimed. I have used my own processed data and achieve similar performance on both the fine-tuned checkpoint and the pretrained checkpoint after fine tuning on my processed data.