yunjey / show-attend-and-tell

TensorFlow Implementation of "Show, Attend and Tell"
MIT License
908 stars 323 forks source link

Beam search #41

Open wjb123 opened 7 years ago

wjb123 commented 7 years ago

Hi, I am reading your excellent code, but find no beam search during caption generation as the source code in https://github.com/kelvinxu/arctic-captions, is there any reason ?

MenSanYan commented 7 years ago

@wjb123 Yes, I also note that. But, is this the only difference between this code and the original? Have you find other differences? Because After I ran this code, I can't achieve the same experiment result as good as in the paper "Show, Attend and Tell: Neural Image Caption Generation with Visual Attention".

rubenvereecken commented 7 years ago

@MenSanYan A difference in score? Because the lack of a beam search would definitely result in worse results I believe.

Do either of you perhaps why there's the difference I noted in #40?

lvyongqiang4644 commented 6 years ago

@rubenvereecken @MenSanYan Do you have the beam search code about this code?? thanks!!

rubenvereecken commented 6 years ago

I'm afraid I never actually needed the beam search code as I was not working on ntm. I'm sure there is a Tensorflow implementation out there somewhere.

nishant-puri commented 6 years ago

I have read that beam search gives a boost in bleu-4 of around 10%. evaluate_model.ipynb shows a bleu of 21.1 whereas the paper reports 24.3 so that might be the reason for the difference.

  1. Were you able to train this model and get Bleu-4 score of 21.1 ? I am implementing the paper in Pytorch and was unable to reach a good Bleu score.

  2. I found this implementation and am mystified by the magical T/L in the loss ( As you also asked in https://github.com/yunjey/show-attend-and-tell/issues/40 ).

  3. The other difference I noticed was that this implementation uses conv5_3 layer of the vgg19. The paper says "In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. " which would correspond to some other layer

lvyongqiang4644 commented 6 years ago

b-1 b-2 b-3 b-4 METEOR 67.2 | 46.3 | 31.9 | 22.4 | 22.0 this is my best score

jamiechoi1995 commented 6 years ago

@nishant-puri I am also confused by it, the feature map in conv5_3 is actually 1414152, the original image size is 224, and after 4 max-pooling layers, the image size becomes 224/2/2/2/2 = 14, that should be correct. and the author used conv5_4 features. (see https://github.com/kelvinxu/arctic-captions/issues/1)