kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.51k stars 511 forks source link

how can I train LM without <s></s> using lmplz command #296

Open chenjindong opened 4 years ago

kpu commented 4 years ago

This is currently not supported though if you want to send a pull request...

geekypathak21 commented 3 years ago

@kpu what is the use of this flag https://github.com/kpu/kenlm/blob/bdf3c71a34a874de11ab02f23ebe0a0b877c27ef/lm/build_binary_main.cc#L28

kpu commented 3 years ago

It means you can convert an ARPA from another toolkit without these symbols to binary format. However, there is currently no support to train a model without sentence boundary symbols.

geekypathak21 commented 3 years ago

@kpu can you give any leads how can I add support currently I am not familiar with code base.