kpu / kenlm

KenLM: Faster and Smaller Language Model Queries
http://kheafield.com/code/kenlm/
Other
2.5k stars 512 forks source link

Building diffs for patching lm binaries? #274

Open alexcannan opened 4 years ago

alexcannan commented 4 years ago

I'm curious if it would be possible to build a storage-efficient lm.diff file to patch an older lm.binary file into a newer one. I've experimented with some existing binary diff tools and have found the lm.diff file to be roughly the size of the new lm.binary after compression, but could a smarter tool be built for the kenlm model?

kpu commented 4 years ago

In theory this is possible but you'd be digging into smoothing algorithms because the discount parameters impact probability globally. And the quantizer is free to move centers. Possible but annoying.