jssprz / attentive_specialized_network_video_captioning

Source code of the paper titled *Attentive Visual Semantic Specialized Network for Video Captioning*
MIT License
15 stars 3 forks source link

Results from pretrained models don't match paper #7

Open DavidMChan opened 2 years ago

DavidMChan commented 2 years ago

Thanks for you work on this project!

I followed the instructions in the readme to get your code running, and I wasn't able to reproduce the results from the paper:

MSVD: RESULTS: Bleu_1: 0.858 Bleu_2: 0.756 Bleu_3: 0.665 Bleu_4: 0.573 METEOR: 0.385 ROUGE_L: 0.749 CIDEr: 0.992 Expected: Bleu_4: 62.3 METEOR: 39.2 CIDEr: 107.7 ROUGE_L: 78.3

MSR-VTT: RESULTS: Bleu_1: 0.812 Bleu_2: 0.679 Bleu_3: 0.547 Bleu_4: 0.428 METEOR: 0.288 ROUGE_L: 0.617 CIDEr: 0.469 Expected: Bleu_4: 45.5 METEOR: 31.4 CIDEr: 50.6 ROUGE_L: 64.3

I noticed that these are epoch 15 checkpoints, but in the paper, the models were trained for ~70 epochs, are you willing to make these final models available, or the code infrastructure for training a new model?

jssprz commented 2 years ago

Thank you! We have decided to share the training code after November 16.

Gautam-git1050 commented 2 years ago

can you provide your training code?