harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

How to get the context vector #94

Closed lenhhoxung86 closed 7 years ago

lenhhoxung86 commented 7 years ago

Thanks for sharing the great code. I want to use your pre-trained model to extract the semantic vector, which is the feature vector from the intermediate layer so that I can use for other purposes. Basically, I want to use it as a 'doc2vec' tool. How can I do it? Thanks.

guillaumekln commented 7 years ago

Hi,

At the end of this loop:

for t = 1, source_l do
  local encoder_input = {source_input[t]}
  if model_opt.num_source_features > 0 then
    append_table(encoder_input, source_features[t])
  end
  append_table(encoder_input, rnn_state_enc)
  local out = model[1]:forward(encoder_input)
  rnn_state_enc = out
  context[{{},t}]:copy(out[#out])
end

https://github.com/harvardnlp/seq2seq-attn/blob/master/s2sa/beam.lua#L165

rnn_state_enc[#rnn_state_enc] contains the input encoding, i.e. the RNN output of the last timestep.

Alternatively, you can take a look at OpenNMT which is based on seq2seq-attn and offers more features like this one (option -dump_input_encoding during translation).

zhang-jinyi commented 7 years ago

@guillaumekln Hi,I found this issue. Sorry to trouble you, but could you tell me how to get the attention matrix in the evaluate(decode) process? I want to make Attention Visualization like http://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html#visualizing-attention but pytorch-openNMT didn't implement additional word features yet.

Thank you in advance.