Great code/model but i see one problem:
I think that results for character models are only valid for english corpus (ptb). For all other languages (especially for russian where all letters are 2-byte sequences) you actually have models for sequences of bytes not for sequences of characters. Am i right? or you converted corpora to language specific one-byte encoding before processing?
thanks for letting us know! we've uploaded a fix for this. we will also be updating the paper with the new results (results are largely the same, but we do a little better on russian).
Great code/model but i see one problem: I think that results for character models are only valid for english corpus (ptb). For all other languages (especially for russian where all letters are 2-byte sequences) you actually have models for sequences of bytes not for sequences of characters. Am i right? or you converted corpora to language specific one-byte encoding before processing?