jazzsaxmafia / show_attend_and_tell.tensorflow

BSD 2-Clause "Simplified" License
506 stars 191 forks source link

Bugs report and problems #10

Open Liu0329 opened 8 years ago

Liu0329 commented 8 years ago

@jazzsaxmafia Have you trained out a good model from your code ?

There might be a bug: In the function of build_model and build_generator in model_tensorflow.py h = o * tf.nn.tanh(new_c) should be replaced by h = o * tf.nn.tanh(c)

Another problem is about context_encode. Is that same with the original code ? Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.

jazzsaxmafia commented 8 years ago

Yes, you are totally right. I should've corrected my codes but I was too busy nowadays. I will work on it as soon as possible.

Thank you for pointing the bugs out. -Taeksoo

RutgersHan commented 8 years ago

@Liu0329 @jazzsaxmafia Agree with Liu0329. Is there another bug in the next line: logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b should be replaced by logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b \

As (7) in the original papers, logits should be computed by W^T concat(weighted_context, word_emb, h) ?

Liu0329 commented 8 years ago

@RutgersHan Have you trained out a model that can be used ?

RutgersHan commented 8 years ago

@Liu0329 I did not use attention model for image captioning. I used the attention model for another task, For me, the result is not very good. It is only a little better comparing the model without attention. I am not sure whether it is due to my implementation or something else. So that's why I am asking in the previous question, do you think there is bug for the logits computation in this implementation?

Liu0329 commented 8 years ago

@RutgersHan Yes, according to the paper the logits have 3 parts as you showed. Now I will try the training again. This repo seems unfinished, lots of parts simplified compared with the original code. We can help to improve it.

2g-XzenG commented 7 years ago

@Liu0329 @jazzsaxmafia @RutgersHan

Hello, I am currently reading the code and thanks for the post! For the 'alpha' calculation, aren't we suppose to use the context ('context_encode') and the previous hidden state? but in the implementation, it seems that it's summing up all the previous hidden state?

context_encode = context_encode + \ tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) + \ self.pre_att_b

Thanks!

sjksong commented 5 years ago

@Liu0329 @jazzsaxmafia @RutgersHan @1230pitchanqw Hello,I'm a undergraduate from Communication University Of China ,I'm currently working on this area for my graduation paper . Could you please tell me if you have worked out a satisfactory result based on this repo ? if it is ,how many epoch did you use to train out a good model? and what else did you change the original code? I would appreciate it if you can give me a reply ,thank you very much.