jazzsaxmafia / show_attend_and_tell.tensorflow

BSD 2-Clause "Simplified" License
506 stars 191 forks source link

a lots of weight matrix #5

Open qingzew opened 8 years ago

qingzew commented 8 years ago

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why. in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b
context_encode += h * u
context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W) + self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

jazzsaxmafia commented 8 years ago

Hello, The LSTM used in this project gets not only input (words) and last state, but also aggregated image features. That is why it has extra weights. Since "Show attend and tell" is about attending on a specific part of a image, some other weights for attention mechanism are also used. Those that have "att" in its variable names are all about attention.

Alpha values are the attention values. If the model decides to attend on upper-left part of an image, the alpha value corresponding that region will be big.

Hope this could answer your questions. If you have other questions, please let me know. Thank you. -Taeksoo

2016-05-23 15:31 GMT+09:00 qingzew notifications@github.com:

in the function of build_model, I saw a lots of weight mtarix, for example image_att_W, hidden_att_W, att_W, image_encode_W and so on, I don't know why. in my opion, the lstm has 2 weight matrix, w for input and u for hidden status, so I write code in the for loop like this:

context_encode = input * w + b context_encode += h * u context_encode = tanh(context_encode)

but what's that about alpha = tf.matmul(context_encode_flat, self.att_W)

  • self.att_b, and in line 110 again softmax function, and in line 114 another weight matrix image_encode_W

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflow/issues/5

qingzew commented 8 years ago

Hi thank you for your answer, quickly something I know, but not all of it. the lstm has a input of image, so this line

context_encode = tf.matmul(context_flat, self.image_att_W)

gives it a weight, right?, but I think it's not necessary?

between line 100 and 110, it computes the attention values, but why call activation function twice, why does it call reshape and softmax, what's the problem of tanh, it can be used as the attention values?

I think I need to read more about 'show and tell'

Liu0329 commented 8 years ago

@qingzew the computation of context_encode also confused me. Are your clear now ?

qingzew commented 8 years ago

@Liu0329 you can see this blog, https://blog.heuritech.com/2016/01/20/attention-mechanism/, it is clear, but a little different from this project

Liu0329 commented 8 years ago

@qingzew great, thanks !

Liu0329 commented 8 years ago

@qingzew Have your trained out a model that can be used ?

qingzew commented 8 years ago

@Liu0329 no, image-caption is not my focus, I'm doing something about nlp with a attention model, but I have some problems to implement the model with tensorflow