Open Liu0329 opened 8 years ago
Yes, you are totally right. I should've corrected my codes but I was too busy nowadays. I will work on it as soon as possible.
Thank you for pointing the bugs out. -Taeksoo
@Liu0329 @jazzsaxmafia Agree with Liu0329. Is there another bug in the next line: logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b should be replaced by logits = tf.matmul(h, self.decode_lstm_W) + self.decode_lstm_b \
As (7) in the original papers, logits should be computed by W^T concat(weighted_context, word_emb, h) ?
@RutgersHan Have you trained out a model that can be used ?
@Liu0329 I did not use attention model for image captioning. I used the attention model for another task, For me, the result is not very good. It is only a little better comparing the model without attention. I am not sure whether it is due to my implementation or something else. So that's why I am asking in the previous question, do you think there is bug for the logits computation in this implementation?
@RutgersHan Yes, according to the paper the logits have 3 parts as you showed. Now I will try the training again. This repo seems unfinished, lots of parts simplified compared with the original code. We can help to improve it.
@Liu0329 @jazzsaxmafia @RutgersHan
Hello, I am currently reading the code and thanks for the post! For the 'alpha' calculation, aren't we suppose to use the context ('context_encode') and the previous hidden state? but in the implementation, it seems that it's summing up all the previous hidden state?
context_encode = context_encode + \ tf.expand_dims(tf.matmul(h, self.hidden_att_W), 1) + \ self.pre_att_b
Thanks!
@Liu0329 @jazzsaxmafia @RutgersHan @1230pitchanqw Hello,I'm a undergraduate from Communication University Of China ,I'm currently working on this area for my graduation paper . Could you please tell me if you have worked out a satisfactory result based on this repo ? if it is ,how many epoch did you use to train out a good model? and what else did you change the original code? I would appreciate it if you can give me a reply ,thank you very much.
@jazzsaxmafia Have you trained out a good model from your code ?
There might be a bug: In the function of build_model and build_generator in model_tensorflow.py
h = o * tf.nn.tanh(new_c)
should be replaced byh = o * tf.nn.tanh(c)
Another problem is about context_encode. Is that same with the original code ? Moreover, I think data should be shuffled for each epoch. The code seems only shuffle the data once.