The sorter should avoid sorting and use a smarter inserting algorithm

We use grenad in Meilisearch, and when we benchmark and flamegraph the indexing process, which is intensively utilizing this library, we can see that 12-15% of the time is passed in the sort_unstable_by_key of the sorter. This sort is executed to ensure that the values are when we need to dump all the in-memory entries into a writer.

According to the documentation, sorting the entries by using sort_unstable_by_key is O(m n log(n)) where m is the key function. As our key function, the function that computes our key, can be considered O(1), our current sorting solution should be O(n * log(n)).

Inserting an entry in our current data structure is O(1) but not when we need to dump into the writer.

https://github.com/meilisearch/grenad/blob/dd79a338b4be073913cb538b173769042188558d/src/sorter.rs#L246-L253

The sorter constraints

Some of the rules that apply to the sorter are:

We never read entries before the need to dump the entries into a writer.
We need to read the values in order only once to merge them before dumping them into a writer.
We need to support entries with the same key and only merge it once at the end.

Those constraints made me think of a more straightforward implementation of a binary heap, where we do not need to implement removing values but only read them in order, i.e. the binary heap's into_sorted_iter.

Using a binary heap

The only downside that I see about using a heap is about supporting entries with duplicate keys. One solution I see to keep the algorithm simple on both sides, the binary heap inserting system and the current merging system, is to postfix every key by an incrementing big-endian number. This way, we can keep the order and uniqueness of the entries. The downside of this solution is it can impact memory usage. The impact depends on the workload and the size of the inserted keys.

According to the documentation of the standard binary heap, inserting a value is O(1), and poping values is O(log(n)).

Adding benchmarks

We need to introduce more benchmarks to measure our progress on this subject and avoid regressions. To do so, we need to find suitable real-world datasets or find a way to generate them properly. Measuring is complex.

We had a conversation with @MarinPostma and we thought that we could experiment something with the standard binary heap and some RefCells. Storing the EntryBoundAlignedBuffer in a RefCell and an Rc. Using this Rc<RefCell<EntryBoundAlignedBuffer>> in the EntryBounds to be able to implement PartialOrd/Eq and use these in the binary heap.

Regarding the possibility of duplicate keys, I proposed that we store the bounds_count number directly into the EntryBound struct to be able to differentiate them, this number is always incrementing and reset when the sorter is dumped into the writer. We must be careful to Ordering::reverse as the binary heap is a max-heap and we want a min-heap.

meilisearch / grenad