renepickhardt / generalized-language-modeling-toolkit

Generalized Language Modeling toolkit
http://glm.rene-pickhardt.de
52 stars 17 forks source link

CountCache should only hold sequences that will be checked in testing. #67

Closed lschmelzeisen closed 9 years ago

lschmelzeisen commented 9 years ago

Modify CountCache to only hold sequence in cache that occur during testing.

Phase 1 should check which sequence are needed during testing? Phase 2 should only load those into chache.

For this to be succesfull, additional data has to be generated during training, see #66.

lschmelzeisen commented 9 years ago

Implemented. QueueCacheCreator will take all counted sequences and substract all that are not needed for a query file. The resulting directory can than be given to CountCache.