Closed kaykay-dv closed 6 months ago
The problem has been identified in the insert_or_update method of the PocketSearch class. insert_or_update should use the internal rowid identifier to update existing entries not a unique ID field provided by the user. When using a custom unique ID field, the token table does not get updated correctly.
Fixed in 0.30.0
It seems the "num_documents" property returned by .tokens displays the wrong the number of documents. It seems to be an over-estimate. E.g. when indexing a corpus of 160.000 documents, the most common token ("the") appears in 321942 documents according to the statistics which is obviously wrong.