ruotianluo / self-critical.pytorch

Unofficial pytorch implementation for Self-critical Sequence Training for Image Captioning. and others.
MIT License
997 stars 278 forks source link

visual genome dataset #115

Open arjung128 opened 5 years ago

arjung128 commented 5 years ago

While this repo supports MSCOCO and Flickr30k, if we were to replace all the data files (cocobu_att, cocobu_fc, cocotalk.json, cocotalk_label.h5, captions_val2014.json) with the equivalent for the Visual Genome dataset, would the code (specifically the transformer model) still work?

I got the equivalent data files for Visual Genome from this repo which is heavily based on your repo -- when I trained it with the same hyper-parameters as MSCOCO, the scores were oscillating slightly around:

Bleu_1: 0.056 Bleu_2: 0.034 Bleu_3: 0.021 Bleu_4: 0.012 METEOR: 0.078 ROUGE_L: 0.224 CIDEr: 0.047

for as long as I trained it.

Do you think any hyper-parameter or architectural changes may be needed so that the transformer model works well on the Visual Genome dataset?

Thank you once again for both making your work public, and also for being very active on these questions with your prompt replies! I really appreciate it!

ruotianluo commented 5 years ago

How about following the hyperparameter in that code. I think the change made in that repo compared to mine is not that much. Just dozens of commits. Should be easy to track.

arjung128 commented 5 years ago

Thanks for your response.

I used the hyper-parameters in that code (which were very very similar to yours) and observed the same results.

Do you have any ideas why this may be happening, or anything that can be tried?

Thanks once again for your continuous support. Much appreciated.

ruotianluo commented 5 years ago

did you use block-trigram?

arjung128 commented 5 years ago

No, I did not.

The 'Training for Diversity in Image Paragraph Captioning' code doesn't use block-trigrams for the initial cross-entropy training, it only used it for the self-critical sequence training portion -- But it achieves very good BLEU/METEOR/ROUGE_L (not CIDEr) scores during the initial cross-entropy without block-trigrams...

If I forgot to mention this earlier, the results I posted previously were from the stage before self-critical sequence training.

I can try adding block-trigrams in both the cross-entropy and the self-critical phases separately.

ruotianluo commented 5 years ago

What about turning it on during evaluation?

Ruotian Luo

On Aug 19, 2019, at 9:21 PM, Arjun Gupta notifications@github.com wrote:

No, I did not.

The 'Training for Diversity in Image Paragraph Captioning' code doesn't use block-trigrams for the initial cross-entropy training, it only used it for the self-critical sequence training portion -- But it achieves very good BLEU/METEOR/ROUGE_L (not CIDEr) scores during the initial cross-entropy without block-trigrams...

If I forgot to mention this earlier, the results I posted previously were from the stage before self-critical sequence training.

I can try adding block-trigrams in both the cross-entropy and the self-critical phases separately.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

arjung128 commented 5 years ago

Did you mean turning block-trigrams on only during evaluation and not during training?

I set block-trigrams to 1 only during evaluation and not during training and got pretty much the same results as last time:

Bleu_1: 0.056 Bleu_2: 0.033 Bleu_3: 0.019 Bleu_4: 0.011 METEOR: 0.075 ROUGE_L: 0.207 CIDEr: 0.036

Is this what you meant?

homelifes commented 5 years ago

Hi @arjung128, Have you made it work? I am curious to know the results for image paragraph captioning using a transformer, as a transformer is suited for long sequences such as paragraphs