smilli / berkeleylm

Automatically exported from code.google.com/p/berkeleylm
1 stars 1 forks source link

Can I feed this library raw counts instead of text files, and have it compute the Kneser Ney probabilities for me? #13

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If we have a very large corpus that I would like to take counts of in some 
distributed way, is there a way to give those raw counts to this code to build 
my model for me?

Original issue reported on code.google.com by b...@parakhi.com on 17 Jul 2013 at 7:27

GoogleCodeExporter commented 9 years ago
The answer is "sort of". There is code in place to estimate Kneser Ney 
probabilities from a Google-ngram-formatted corpus (see 
https://groups.google.com/forum/#!topic/berkeleylm-discuss/G6Ta2YTsAA0). 
However, there may be some bugs. But please try running it, and seeing what 
happens. If it crashes, I'll have extra incentive to fix it. 

Original comment by adpa...@google.com on 17 Jul 2013 at 8:14