harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Implementing biased decoding and tagging words #88

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hello, thanks for the great work! I am currently learning about Torch, and I am using your model for my high school project, which is to teach the model how to add missing punctuation into sentences.

Right now, I am trying to implement biased decoding such that the decoder only considers either punctuation or the original word (so it doesn't replace it with weird words and stuff). There is an example of this done in TensorFlow here. However, after reading through your code, I am still unsure about how to do this in Torch. Also, I was wondering if it is possible to add tags/features to words, as I am trying to add POS tagging to improve the model (is this called tokenisation?). Is it possible for you to provide some advice? Any help will be much appreciated :)

yoonkim commented 7 years ago

Yes, it is possible to bias decoding.

Let's say you never want to predict "the", and further assume that this is index 10 in the vocab (check out the *.dict file to get the index).

Then you want to add the following in your beam search code after loading the model: generator = model[3] generator.modules[1].bias[5] = -10-e9

Hope this helps!

It is also possible to add tags/features to words. Check out OpenNMT, which makes this much easier http://opennmt.net/

ghost commented 7 years ago

Okay, but what if I never want to predict any word except for let's say, the original input word, and a comma. Is there a cleaner way to do this?

yoonkim commented 7 years ago

Sure, checkout the State.disallow function in beam.lua

function StateAll.disallow(out) local bad = {1, 3} -- 1 is PAD, 3 is BOS for j = 1, #bad do out[bad[j]] = -1e9 end end

you can modify it to take in a table of tokens, rather than using a fixed 'bad' set

ghost commented 7 years ago

I took a look at it, but seems like this 'bad' set is the set of words I don't want the system to output. But in my case, I instead would like there to be a 'good' set of words that the system can only output. So instead of listing out all the thousands of words I don't want in a 'bad' table, I want to list out the only words that I want in a 'good' table (which there are only about 10 or so), and let the model pick from there. Also, I didn't find a similar function in the OpenNMT project. Can this be done?

yoonkim commented 7 years ago

Yes. For each sentence, initialize a vector like good_words = torch.zeros(V):fill(-1e9)

and then modify the indices of this vector to zero for the ' good words'

Then add this vector to your beam scores at each step of decoding.