meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 81 forks source link

Phrase search containing duplicates #647

Closed ManyTheFish closed 2 years ago

ManyTheFish commented 2 years ago

When doing a PHRASE search containing several times the same word, no results are returned by Meilisearch.

Step to reproduce

1) push some documents containing several times the same word together:

$ curl \
  -X POST 'http://localhost:7700/indexes/movies/documents' \
  -H 'Content-Type: application/json' \
  --data-binary '[{"id": 1, "title": "knock knock"}]'

2) Make a PHRASE search query containing duplicates:

$ curl \
  -X POST 'http://localhost:7700/indexes/movies/search' \
  -H 'Content-Type: application/json' \
  --data-binary '{ "q": "\"knock knock\"" }'

3) Meilisearch should return the document

Possible Fix

This Bug comes from the indexing part of the code when we compute the word_pair_proximity_docids database in src/update/index_documents/extract/extract_word_pair_proximity_docids.rs. In document_word_positions_into_sorter we forgot to extract the proximity of the current position of the current word with the next position of it.

During the increase of the current word position we could extract the proximity between the current position and the next one.

Files expected to be modified