Open GoogleCodeExporter opened 9 years ago
Answering my own question, it looks like it's 10.
It should probably be in the method description. Thanks!
Original comment by denis.fi...@gmail.com
on 7 May 2013 at 1:38
Sorry, I thought I had replied.
That particular method doesn't actually know what base it is, since the LM was
probably constructed from an ARPA file, and those files can be in whatever base
they want (they are stored as logarithms). Building an LM with BerkeleyLM done
in base 10 to mimic the behaviour of SRILM. So the answer is almost certainly
"10", unless you constructed your LM in a non-standard way.
Original comment by adpa...@google.com
on 7 May 2013 at 2:19
Thanks!
I still think it should be a part of the specification because the contract of
an n-gram LM is a proper distribution p(w_i|...), and we can't have it without
knowing the log base. It's true that many applications don't care about the log
base, but some do (e.g., perplexity, text generation).
Original comment by denis.fi...@gmail.com
on 7 May 2013 at 9:49
I glanced at the code, and it looks as though StupidBackoff is using log base
e, while the Kneser-Ney models are using log base 10. I've been using this
package for my research, and it'd be nice to know what exactly the values are
supposed to be.
Original comment by acgris...@gmail.com
on 15 Nov 2013 at 7:56
Sorry, I missed this somehow.
I've added some comments in the latest SVN to hopefully clear this up. I'm not
going to change it, just because I want to mimic SRILM in constructing
Kneser-Ney LMs, and also don't want to change the logarithm base on
StupidBackoffLms because that would change the models on people who are
currently using them. Hope that clarifies things!
Original comment by adpa...@gmail.com
on 6 Dec 2013 at 6:30
Original issue reported on code.google.com by
denis.fi...@gmail.com
on 6 May 2013 at 9:38