Open mattberns opened 2 years ago
That page appears to have a 5-gram Kneser-Ney model then encourage people to load it with a lower order (such as a bigram model). This is a bad idea: https://neural.mt/papers/edinburgh/rest_paper.pdf . If you want just a bigram model, train a bigram model.
Howdy yall. I am trying to analyze the data in the language models found here: https://bio.nlplab.org/#ngram-model
I am loading the 1-gram + 2-gram data into the arpa format, everything looks good / clean, yet I get the following
Non-zero backoff -1.5930591 provided for an n-gram that should have no backoff in the 2-gram at byte 905261256 Byte: 905261256
I looked at the rows that contain this value and I find the following
-0.92665285 <s> The -1.5930591
-1.5930591 trypanosomes/ml [ -0.10104541
The commend issued was:
echo "in primary care" | ./query ./full.arpa