mynlp / cst_captioning

PyTorch Implementation of Consensus-based Sequence Training for Video Captioning
59 stars 17 forks source link

can't reproduce to cider 54.2 #10

Open xiadingZ opened 6 years ago

xiadingZ commented 6 years ago

I train a WEX model and get a cider score about 50, then train CST_MS_Greedy according to your options, bug cider score doesn't grow up by reinforcement learning. You model provided can't be loaded for test also. Can you give a hint about how to use your model or how to produce cider score 54.2?

plsang commented 6 years ago

Thanks for reporting this issue! I noted that the default category feature files were not correct -- it should be the one-hot embedding files, rather than the glove embedding files.
I have updated these files on the shared drive. If you resync these files, you should get the correct files to test the provided model.
I recently reran my method 10 times, and I got an average CIDER score of 54.3. I hope you can reproduce my results.

xiadingZ commented 6 years ago

Does just use make train GID=0 EXP_NAME=WXE FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=0 SCB_CAPTIONS=0 LOGLEVEL=DEBUG MAX_EPOCHS=50

then make train GID=0 EXP_NAME=CST_MS_SCB FEATS="resnet c3d mfcc category" USE_RL=1 USE_CST=1 USE_MIXER=1 MIXER_FROM=1 SCB_BASELINE=1 SCB_CAPTIONS=20 USE_EOS=1 LOGLEVEL=DEBUG MAX_EPOCHS=200 START_FROM=output/model/WXE

can reproduce CIDEr score of 54?

I have submitted a pull request to upgrade to pytorch 0.4. And I will try to migrate to python3 in the future. you can review it.

xiadingZ commented 6 years ago

Also, can you give a hint about how to extract these features?

plsang commented 6 years ago

Your commands look good to me, except that I used SCB_BASELINE=2 in the last run. The result can be a bit different in your environment. I know someone reported that they got CIDEr 53.7. I expect you can also get a similar number.

Btw, thanks for sending the PR. Unfortunately, I am not able to test it now. But please keep working on this, and let me know if you can reproduce my results with this PR.

I have answered your question about feature extraction in #7