linjieli222 / HERO

Research code for EMNLP 2020 paper "HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training"
https://arxiv.org/abs/2005.00200
MIT License
230 stars 34 forks source link

Reproducing VCMR results for didemo video only #40

Closed hdy007007 closed 2 years ago

hdy007007 commented 2 years ago

Hi @linjieli222, I have reproduced didemo_video_only, the results results I reproduced are much better than those in the paper. The reproduced results are as follow:

672 03/02/2022 21:22:48 - INFO - main - validation finished in 3 seconds 673 03/02/2022 21:22:51 - INFO - main - start running full VCMR evaluationon didemo_video_only test split... 674 03/02/2022 21:22:54 - INFO - main - metrics_no_nms_VCMR 675 { '0.5-r1': 3.590012556504269, 676 '0.5-r10': 16.97737066800603, 677 '0.5-r100': 44.225404319437466, 678 '0.5-r5': 11.299871923656454, 679 '0.7-r1': 2.9875866398794573, 680 '0.7-r10': 14.11497237569061, 681 '0.7-r100': 37.99495228528378, 682 '0.7-r5': 9.289997488699145}

 And the results in paper:

图片

I don't find the reason for this result.

linjieli222 commented 2 years ago

Hi there,

Thanks for your interests in our HERO project. The release checkpoint is a stronger pre-trained model with longer pre-training steps than what's reported in the paper.