Disable smoothing - Githubissues

Hi! I am using KenLM on massive corpora of text to explore the properties of those datasets (i.e., Common Crawl, Wikipedia, etc.).

I am not trying to use KenLM to generate new text; I want to explore the occurrences of specific phrases and the raw counts of n-gram occurrences in the training corpus (fine if this is the log probability of a sequence, don't necessarily need exactly counts). As such, I want to disable smoothing so I can be sure that one phrase is more probable than another because those n-grams appear more frequently, not because of smoothing out-of-vocabulary or rare tokens.

Can I disable smoothing altogether with KenLM, or is this not the right tool for my use case? If so, how? Thanks!

kpu / kenlm

Disable smoothing #432