well-typed / full-text-search

An in-memory full text search engine library. It lets you run full-text queries on a collection of your documents.
Other
47 stars 5 forks source link

Duplicate terms in queries #15

Open adamgundry opened 1 year ago

adamgundry commented 1 year ago

If a query contains the same term multiple times, at the moment the code will look it up repeatedly and take the union of the doc ID set with itself. This could be inefficient if the set is large. We could take the nub of the query, but I assume that would influence scoring, so it might be better to give each repeated term a multiplicity and look it up/score it once but multiply the score?