harvardnlp / seq2seq-attn

Sequence-to-sequence model with LSTM encoder/decoders and attention
http://nlp.seas.harvard.edu/code
MIT License
1.26k stars 278 forks source link

Char2Char, Performance, Speed, Training tips? #16

Closed alrojo closed 8 years ago

alrojo commented 8 years ago

Thanks for the repository! I have a few questions regarding performance (speed and evaluation) and training on word and char level.

  1. You truncate the length to 50 on word, what about on chars?
  2. What was your BLEU findings? On word and char level? (5 BLEU on chars after 5 epochs WMT'15, 15 BLEU on words after 5 epochs Europarl, etc.?)
  3. In the training speed you mention tokens, how many batches is a token? e.g. for 64 batch_size and 50 seq_len, is a 20,000 tokens/sec = 20,000 / (batch_size x 2 x seq_len) = 3.125 batches/second?
  4. What was your biggest challenges/takeaways with going to chars instead of using words?
  5. How did you regularize the model?

Thanks!

yoonkim commented 8 years ago

Just to clarify, predictions are always made at the word level. The "char" part refers to when the input word embeddings are replaced with a character CNN.

Of course, you can modify the input data (so "hi there" becomes "h i SPACE t h e r e") and use the exact same code to do char2char work.

alrojo commented 8 years ago

Thanks for your response @yoonkim !

In the training speed you mention tokens, how many batches is a token? e.g. for 64 batch_size and 50 seq_len, is a 20,000 tokens/sec = 20,000 / (batch_size x 2 x seq_len) = 3.125 batches/second?

yoonkim commented 8 years ago

Yes, roughly. (but seqlength can vary per batch)