meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 82 forks source link

Store fuzzy/bucketed positions in `word_position_docids` database #746

Open loiclec opened 1 year ago

loiclec commented 1 year ago

Pull Request

Related issue

Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3222

Implementation

The design is described well in the related issue. For details of how different relative positions are grouped together, see the test bucketed_position.

Basically, we no longer store the exact position of words that appear far into an attribute, but instead group relative positions together in buckets whose size grows exponentially with the original position. This is done to improve the relevancy and the performance of the attribute ranking rule.


This is a draft until https://github.com/meilisearch/milli/pull/742 is merged and the results of the benchmarks are available.


EDIT: I also realised just now that the iterative version of the algorithm needs to be updated as well!

loiclec commented 1 year ago

I think I am going to postpone this improvement to v1.1 because:

  1. The iterative version of the algorithm also needs to be updated
  2. I found an unrelated bug in the implementation of the set-based version of the algorithm, and I would like to debug it first.
  3. The whole design of the attribute ranking rule will change a lot soon, and so will the whole structure of almost all search algorithms, so I don't want to duplicate the work too much