meilisearch / milli

Search engine library for Meilisearch ⚡️
MIT License
464 stars 81 forks source link

Reduce memory usage of the MatchingWords structure #708

Closed loiclec closed 1 year ago

loiclec commented 1 year ago

Pull Request

Related issue

Fixes (partially) https://github.com/meilisearch/meilisearch/issues/3115

What does this PR do?

  1. Reduces the memory usage caused by the creation of a 10-word query tree by 20x. This is done by deduplicating the MatchingWord values, which are heavy because of their inner DFA. The deduplication works by wrapping each MatchingWord in a reference-counted box and using a hash map to determine whether a MatchingWord DFA already exists for a certain signature, or whether a new one needs to be built.

  2. Avoid the worst-case scenario of creating a MatchingWord for extremely long words that cannot be indexed by milli.

bors[bot] commented 1 year ago

Build succeeded:

Kerollmops commented 1 year ago

Ho no! We should not merge new PRs on this main branch until we release v0.30.1 of Meilisearch and fix the current bugs! Should we revert this PR on main @curquiza?

curquiza commented 1 year ago

No, it's ok, let is as it is. I will create a custom branch and I will cherry pick the commits on it. Merge everything your need to merge on main