jssprz / visual_syntactic_embedding_video_captioning

Source code of the paper titled *Improving Video Captioning with Temporal Composition of a Visual-Syntactic Embedding*
MIT License
29 stars 8 forks source link

Result gap #6

Open RyanLiut opened 3 years ago

RyanLiut commented 3 years ago

Hi,

Thanks for your code. I found there is a gap comparing the recorded results (on the paper or the repo) after I exactly followed the "test" code. Here are my results:

MSVD: RESULTS: Bleu_1: 0.906 Bleu_2: 0.814 Bleu_3: 0.725 Bleu_4: 0.627 (v.s. 0.644) METEOR: 0.397 ROUGE_L: 0.783 CIDEr: 1.089

MSRVTT: Bleu_1: 0.831 Bleu_2: 0.702 Bleu_3: 0.566 Bleu_4: 0.443 (v.s. 0.464) METEOR: 0.288 ROUGE_L: 0.625 CIDEr: 0.501

Is there something wrong?

Thank you.

jssprz commented 3 years ago

Hi,

We have not released the final checkpoints of our models (epoch 100). As the readme says, we have shared the pre-trained models at epochs 41 (for MSVD) and 12 (for MSR-VTT).

Thanks

RyanLiut commented 3 years ago

I see, thank you.