renepickhardt / generalized-language-modeling-toolkit

Generalized Language Modeling toolkit
http://glm.rene-pickhardt.de
52 stars 17 forks source link

CountingTest #80

Closed lschmelzeisen closed 9 years ago

lschmelzeisen commented 9 years ago

Needs to be further implemented:

lschmelzeisen commented 9 years ago

Implemented appropiate Test for continuation counts.

The test fail though, so to my current understanding continuation counts are wrong. This is only for the 1,2,3+ counts not the important 1+ counts though

lschmelzeisen commented 9 years ago

Commit 56262a621d190ec9871de25ead8c40035179d70d fixed continuation counts with a dirty hack. Commit c87dcc48b6b14fd254746994e904d6faa306b870 refactors clean that dirty hack up.

lschmelzeisen commented 9 years ago

Commit c264393f5b01b8b904fcf0c47d2b3a9bb92edacd expanded CountingTest to also test with the en0008t corpus. But because the copus is so large, for every sequence, it will only be checked with a chance of 0.1%. This chance can be easily adjusted inside the test. All tests pass.

lschmelzeisen commented 9 years ago

Reran the test with 1% selective chance, also passed.