Reproducing Results on TVC w/o pre-training, getting scores higher than what is reported in the paper Table 4

Swetha5 commented 2 years ago

Firstly, I would like to thank you for providing the source code along with documentation and weights - it was really helpful.

I tried reproducing the results reported in the paper for TVC dataset (HERO w/o pre-training) - used the weights as mentioned from the checkpoint "pretrain-tv-init.bin" for RoBERTa weight initialization of 6-layer Cross Modal Transformer. The scores are shown below

Reported - 43.62 (Cider) | 10.75 (Bleu@4) Reproduced - 47.52 (Cider) | 11.26 (Bleu@4)

I am getting ~4% better Cider score and ~1% better Bleu score, everything is same - is there any reason why I get better scores than those reported in paper @linjieli222 ? The difference is big and cannot be ignored, any insights would be helpful.

Thanks.

linjieli222 commented 2 years ago

Hi @Swetha5, thanks for your interests in our HERO project. All released configs are based on our best finetuning config after pre-training, which might have led to better performance than what we used for the finetuning experiments w/o pretraining in Table 4 during paper writing. Since this project is 2 years old now, it is nearly impossible for me to track down the original config for this experiment. But I can confirm that your reproduced results match what we reported in our later work VALUE, L5 of Table 4.

Swetha5 commented 2 years ago

Thanks for confirming @linjieli222 the results. Closing the issue as it resolves the query.

linjieli222 / HERO

Reproducing Results on TVC w/o pre-training, getting scores higher than what is reported in the paper Table 4 #46