meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 82 forks source link

Fix hard-deletion of an external id that was soft-deleted and then reimported - main #750

Closed loiclec closed 1 year ago

loiclec commented 1 year ago

Pull Request

Related issue

Fixes (when merged into meilisearch) https://github.com/meilisearch/meilisearch/issues/3021

What does this PR do?

There was a bug happening when:

  1. Documents were added
  2. Some of these documents were replaced using soft-deletion
  3. A deletion of another non-replaced document takes place and triggers a hard-deletion
  4. Documents with the same identifiers as the replaced documents are added again

Then, search results would return duplicate documents. No crash would happen at any time (this is the reason it wasn't caught by the previous fuzz test. I have updated the new one such that it also checks the result of a placeholder search request, which then finds the bug immediately).

The cause of the bug is:

  1. When a hard-deletion is triggered, we try to retrieve the external document id associated with each soft-deleted document id.
  2. Then, we take this list of external document ids and remove each of them from the ExternalDocumentsIds structure.
  3. However, this is not correct in case an existing (non-deleted) document shares the external id of a soft-deleted document.

Implementation of the fix

  1. Before we process a permanent deletion, we update the list of soft-deleted document ids.
  2. Then, the permanent deletion's job is to remove the soft-deleted documents from all data structures. Therefore, to update ExternalDocumentsIds, we can simply call the delete_soft_deleted_documents_ids_from_fsts method, which is faster and simpler.

Correctness

A unit test was added to reproduce the bug. The new fuzz test, when adjusted to check the correctness of a placeholder search, could also instantly reproduce the bug, but now does not find any other problem.

bors[bot] commented 1 year ago

Build succeeded: