Closed Swetha5 closed 2 years ago
Hi @Swetha5, thanks for your interests in our HERO project. All released configs are based on our best finetuning config after pre-training, which might have led to better performance than what we used for the finetuning experiments w/o pretraining in Table 4 during paper writing. Since this project is 2 years old now, it is nearly impossible for me to track down the original config for this experiment. But I can confirm that your reproduced results match what we reported in our later work VALUE, L5 of Table 4.
Thanks for confirming @linjieli222 the results. Closing the issue as it resolves the query.
Firstly, I would like to thank you for providing the source code along with documentation and weights - it was really helpful.
I tried reproducing the results reported in the paper for TVC dataset (HERO w/o pre-training) - used the weights as mentioned from the checkpoint "pretrain-tv-init.bin" for RoBERTa weight initialization of 6-layer Cross Modal Transformer. The scores are shown below
I am getting ~4% better Cider score and ~1% better Bleu score, everything is same - is there any reason why I get better scores than those reported in paper @linjieli222 ? The difference is big and cannot be ignored, any insights would be helpful.
Thanks.