Closed GoogleCodeExporter closed 9 years ago
[deleted comment]
A small addition. I have used half of the data in B to train a 3-gram and a
4-gram model. The 3-gram model exhibits the same kind of problematic behaviour,
whereas the 4-gram model works smoothly. The problem, then, seems to be
somehow related to the order of the model.
Original comment by daniele....@gmail.com
on 3 Feb 2013 at 3:23
I think the problem is that you calling getLogProb with an n-gram that is
longer than the order of the LM. I failed to provide appropriate documentation
or decent error messages about this, so apologies on my part. But it's actually
not quite clear what the user wants in this case: do you want me to score the
n-gram in a scrolling window (a la NgramLanguageModel.scoreSequence), or just
ignore the unused words of context?
In any case, can you please confirm that this is the issue? In parallel, I will
add documentation and some improved error messages.
Original comment by adpa...@gmail.com
on 3 Feb 2013 at 6:21
Thanks for the fast reply!
Yes, I am thinking abut the scrolling window behaviour. On the other hand, how
come that some sequences of the same length can be scored without problems,
whereas others cannot? I would expect the exception to be generated in all
cases in which a sequence is longer than the order.
Original comment by daniele....@gmail.com
on 3 Feb 2013 at 6:33
Right. I think the reason it doesn't always fail is that the lookup first finds
the longest matching suffix from right to left, then computes whatever backoffs
are left over. It's possible to match a 3-gram suffix and have a 2-gram backoff
left over, so that the code never looks up a 4-gram, even though it was a
called on a 5-gram.
Original comment by adpa...@gmail.com
on 4 Feb 2013 at 5:04
I see, thanks for the clarification. I implemented the moving window
behavior and I the failures are resolved.
Daniele
Original comment by daniele....@gmail.com
on 4 Feb 2013 at 5:07
I have changed the behaviour to ignore extra words of context, and added some
documentation to reflect this.
Original comment by adpa...@gmail.com
on 9 Feb 2013 at 5:29
Thanks! :)
Original comment by daniele....@gmail.com
on 9 Feb 2013 at 9:44
Original issue reported on code.google.com by
daniele....@gmail.com
on 3 Feb 2013 at 2:40