kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 513 forks source link

how to change unigram probabilities of few words #144

Closed ghost closed 6 years ago

ghost commented 6 years ago

I am using a function where I am using a logic to re-estimate probabilities of certain OOV words. The python wrapper does not let me change probabilities of the unigrams of certain words or to insert new unigram prb and backoff weights in the arpa file.

Is there anyway I can do that ? My arpa file is huge

kpu commented 6 years ago

The data structure is read-only.

If you just care about editing unigrams, consider breaking the ARPA file up with head and tail at the \2-grams: point. Then editing the unigrams will be easier.

Other option is to write a wrapper. . .

ghost commented 6 years ago

@kpu I want to edit the unigrams and write it back to the arpa file

kpu commented 6 years ago

Ok, break the ARPA file into the header, unigram section, and everything else. Write a program to edit the unigram section (which is small) and update the count in the header. cat the pieces back together.