[image captioning] issues in training phase

Hi，It's a nice work and very helpful for beginners. there is a issue when I write my own code according to your code in image captioning model. in the example of training phase，you said that for a image description "Giraffes standing next to each other" , the source sequence is a list containing ['start', 'Giraffes', 'standing', 'next', 'to', 'each', 'other'] ,but target sequence should be ['Giraffes', 'standing', 'next', 'to', 'each', 'other', 'end'] when we feed the word start to the decoder ,it is expected to output the word 'Giraffes' ,and in next time step when we feed the word 'Giraffes' to the decoder ,it will output 'standing‘. But what makes me confused is that in dataloader.py you padded the caption as start Giraffes standing next to each other end. And in train.py you feed the padded caption to the decoder to get the output as well as used the padded caption as ground truth to calculate the cross entropy loss. That looks strange because you feed the word start to decoder to generate start and in next time step feed the word 'Giraffes' to generate 'Giraffes' ... . In my model, the loss becomes 0 . It is simply read words from input sequence and output it as the generated word. what I thought is that the i-th word in input sequence is i-1 th word in output sequence. But I'm not sure if there is some trick you did in other place to change the input sequence and output sequence. I would be very thankful for any kindly reply.

yunjey / pytorch-tutorial

[image captioning] issues in training phase #172