jiasenlu / AdaptiveAttention

Implementation of "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning"
https://arxiv.org/abs/1612.01887
Other
334 stars 74 forks source link

question about "visual sentinel" #13

Open wzn0828 opened 6 years ago

wzn0828 commented 6 years ago

Dear Jiasen Lu, Thank you for your work on "Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning". I am writing to ask about the "visual sentinel": what is the difference between your "visual sentinel" and the hidden state ht? I think your visual sentinel "st" and the LSTM's hidden state "ht" are the same except for the different symbols. Am I right ? If I am wrong, could you kindly give me some further explanation?

Many thanks in advance for your answer. Kind regards Zhennan Wang

jamiechoi1995 commented 6 years ago

@wzn0828 In my opinion, the formulations of "ht" and "st" are similar, but they are affected by different variables when backpropagating loss, which results in their different effects.

jiasenlu commented 6 years ago

Yes, I agree with @jamiechoi1995 Base on the different model inductive bias you impose, the weight will learn different functions.

wzn0828 commented 6 years ago

@jamiechoi1995 @jiasenlu Thank you all, I think I have gotten a deeper understanding of this sentinel with your help.