yunjey / pytorch-tutorial

PyTorch Tutorial for Deep Learning Researchers
MIT License
29.79k stars 8.03k forks source link

[image captioning] model picture #177

Open rubencart opened 5 years ago

rubencart commented 5 years ago

Hi,

In your picture here the output of the LSTM at the 1st timestep (when the input is the image feature vector) is "\<start>", which is then fed back into the LSTM at the 2nd timestep. However, I don't think you actually train your LSTM to output the "\<start>" token when inputting the image features, right?

So a more correct image would be something like this: image. This is also more similar to the figure at page 4 in the Show & Tell paper by Vinyals et al. (link).

Unless I'm mistaken of course :). Cheers!