Closed GoogleCodeExporter closed 9 years ago
Do you need this access to be fast? I have some functionality which you can
access by doing:
new NgramMapWrapper<W, LongRef>(lm.getNgramMap(), lm.getWordIndexer());
on a StupidBackoffLm. This gives a Map from List<W> to LongRefs. However, this
interface is slow due to all the boxing/unboxing.
Original comment by adpa...@gmail.com
on 14 Jul 2011 at 5:39
Of course, fast is always better :)
However, it seems I have not fully understood the way the library works.
Two questions:
1) As the JavaDocs say that getLogProb() is slow, what is a fast way to get
this information given a phrase?
2) How is this probability computed given the raw counts in the Google web1t
corpus? It seems to me there should be an easy way to just invert the process.
thanks for your help,
Torsten
Original comment by torsten....@gmail.com
on 15 Jul 2011 at 7:52
1) NgramLanguageModel.getLogProb(List<W>) is "slow" because it has to turn the
List<W> into an int[] first. Note that it is not actually "slow", just slow
relative to the efficient accessors in
ArrayEncodedNgramLanguageModel.getLogProb(int[]) and
ContextEncodedNgramLanguageModel.getLogProb. I have added additional comments
that direct you towards those calls so others are not confused by this.
2) The probability is computed using Stupid Backoff. I have added a call to
StupidBackoffLm that grabs the count, and will be releasing a new version of
the code with this fix shortly.
Original comment by adpa...@gmail.com
on 15 Jul 2011 at 6:19
Original issue reported on code.google.com by
torsten....@gmail.com
on 14 Jul 2011 at 7:14