Closed samin9796 closed 4 years ago
@samin9796 can you please share how you did this ?
@amitbcp In case of word-level LM, you have thousands of sentences in a text file and these sentences are separated by words. To prepare for a character-level LM, you just need to separate the words into characters. For example: A p p l e i s a f r u i t All the characters are separated by spaces. The rest (how to build using kenlm) is exactly same for both types of LMs.
Bonus points for mapping space to a token like <space>
Thanks @samin9796 @kpu
I want to build a character level 20 gram LM. What things are different in this case from building a word-level LM?