ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
MIT License
997 stars 279 forks source link

The score of CIDER decreses when the model is test on test server of MSCOCO #133

Open zr-icu opened 5 years ago

zr-icu commented 5 years ago

Thanks for your code first. I trained the model of top-down, the score on karpathy split is similar to the paper. However, when I want to test the model in test sever online of MSCOCO, the scores of model decrease a lot. Is there any method to solve this problem??

Thanks again!!

ruotianluo commented 5 years ago

A decrease of 0.1 on cider is common. I think top-down model on server is using ensemble.

zr-icu commented 5 years ago

Thanks for your quick reply!! Could you give me an example of what weights represent in eval_ensemble.py ?

ruotianluo commented 5 years ago

That is weights of different models. Usually even is very good.

zr-icu commented 5 years ago

That is weights of different models. Usually even is very good.

Thanks for your help, but after training 4 more models with the same parameters. The ensemble models performance still decreases a lot and the weights are set 0.2 for 5 models.

The score of BLEU@4 and CIDEr are 36.5 and 123.1 on Karpathy split. The score of BLEU@4 c5 is 34.3 and c40 is 63,8. CIDEr c5 is 111.9 and c40 is 115.1 which is still a little far away from the original paper. Is there any way to improve the performance of online test dataset?

Thanks again!!