udibr / headlines

Automatically generate headlines to short articles
MIT License
526 stars 150 forks source link

A question in function simple_context #9

Open wai7niu8 opened 7 years ago

wai7niu8 commented 7 years ago

Hi: in train.ipynb, i'm confused about this line: activation_energies = activation_energies + -1e20*K.expand_dims(1.-K.cast(mask[:, :maxlend],'float32'),1) i think this line is unnecessary(i maybe wrong), please explain this line for me in detail. And, when computing the attention weights, i think we should only use the current word's ht(ht is the time step t's hidden state) in decoding, but in the function simple_context, it use all headline words' ht every time step? What's more,can you show me the paper or other references about how to implement the attention layer, i think i am not particularly familiar with it. Thank you.

udibr commented 7 years ago

the first line in the README file gives a link to the paper on which the code is based. please read it several times from start to finish until you feel you understand it. Also read the references it gives.

the line you asked about reduce the energy by a huge value in all places in which mask is zero in the part of the input (0:maxlend) which came from the article. Latter I take a softmax of the energy and as a result locations in which the mask was zero will have almost zero weight.

simple_context works at once on all the time steps.