kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.51k stars 512 forks source link

Format of the Input #92

Closed Doreenruirui closed 7 years ago

Doreenruirui commented 7 years ago

The paper mentioned that each sentence will be padded by and <\s>. Should we process the data and save the file with each sentence in a line?

kpu commented 7 years ago