patrickfrey / strus

Library implementing the storage and the query evaluation for a text search engine. It uses on a key value store database interface to store its data. Currently there exists an implementation based on the google LevelDB library.
http://www.project-strus.net
Mozilla Public License 2.0
47 stars 1 forks source link

big token positions #70

Open andreasbaumann opened 8 years ago

andreasbaumann commented 8 years ago
2016-08-09 10:52:22; strusWebService, error: Token positions of document 693-2009 are out or range (document too big, only 76263 token positions were assigned, maximum allowed position is %65535) (master.cpp:96)

An idea is to have small, big, very big positions in the index. Simply dropping the positions is not really good. The document is a big PDF, but splitting it creates a clustering and a "too small retrieval item" problem.

patrickfrey commented 8 years ago

The problem is due to a limit in the blocks storing positions in the storage. I agree that this must be fixed.