If a query contains the same term multiple times, at the moment the code will look it up repeatedly and take the union of the doc ID set with itself. This could be inefficient if the set is large. We could take the nub of the query, but I assume that would influence scoring, so it might be better to give each repeated term a multiplicity and look it up/score it once but multiply the score?
If a query contains the same term multiple times, at the moment the code will look it up repeatedly and take the union of the doc ID set with itself. This could be inefficient if the set is large. We could take the
nub
of the query, but I assume that would influence scoring, so it might be better to give each repeated term a multiplicity and look it up/score it once but multiply the score?