yufengm / Adaptive

Pytorch Implementation of Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
107 stars 42 forks source link

Training on MSCOCO and testing on Flickr #7

Open saitarslanboun opened 6 years ago

saitarslanboun commented 6 years ago

Hi,

I am training your model with MSCOCO dataset, and validating with Flickr validation data.

After 2nd epoch, I started to have results like;

"A yellow train", "A small bus", "An aeroplane." etc.

The remaining of the sentences are missing.

Is it because of the data, or what?

Thank you,

yufengm commented 6 years ago

Did you validate or test on Flickr? I haven't looked at results at early epochs. Typically it converges at least after epoch 20.

saitarslanboun commented 6 years ago

I see. Actually my model is huge, (in which I have integrated your model), and for this reason I can use at most batch size 5. In this way, I have decreased learning rate to 0.0001.

Did you get close results to the relevant paper?

yufengm commented 6 years ago

Because I didn't implement the beam search, so there is still like 2point margin.

fawazsammani commented 6 years ago

@saitarslanboun I've implemented this paper on the Flickr30k dataset, and the trained model is provided. You can find the implementation here