Closed alrojo closed 8 years ago
Just to clarify, predictions are always made at the word level. The "char" part refers to when the input word embeddings are replaced with a character CNN.
Of course, you can modify the input data (so "hi there" becomes "h i SPACE t h e r e") and use the exact same code to do char2char work.
Thanks for your response @yoonkim !
In the training speed you mention tokens, how many batches is a token? e.g. for 64 batch_size and 50 seq_len, is a 20,000 tokens/sec = 20,000 / (batch_size x 2 x seq_len) = 3.125 batches/second?
Yes, roughly. (but seqlength can vary per batch)
Thanks for the repository! I have a few questions regarding performance (speed and evaluation) and training on word and char level.
Thanks!