kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.51k stars 511 forks source link

Easiest way to calculate p(v | context) for all v in the vocabulary using the python api? #284

Open arvieFrydenlund opened 4 years ago

arvieFrydenlund commented 4 years ago

At every step I'd like to be able to look at the probability of all possible continuations. Do I just call model.BaseScore(state, v, state2) for all v in the vocabulary?

Also I'm confused about what state and state2 are doing for this api? Do I just keep alternating them as I move though the sentence as in this example

`accum += model.BaseScore(state, "a", state2)

accum += model.BaseScore(state2, "sentence", state)`

Thanks.

kpu commented 4 years ago

There is no fast path for scoring the entire vocabulary in a given context. A forward trie is more optimal for that. KenLM implements a reverse trie to optimize individual query speed.

You can keep alternating the states as you move through the sentence. It's just an optimization to avoid a copy or object churn.