meilisearch / grenad

Tools to sort, merge, write, and read immutable key-value pairs :tomato:
https://docs.rs/grenad
MIT License
25 stars 3 forks source link

Reduce the amount of bytes memcpy in the merger #12

Closed Kerollmops closed 3 years ago

Kerollmops commented 3 years ago

This PR highly reduces the amount of memcpy when merging multiple Readers by only using references from inside the decompressed block bytes inside the Reader. To do that we have introduced a new Reader::current method that returns the key-value pair that is currently pointed and that has been returned by the previous call to Reader::next, None if it has been called already.

The only time we copy memory is when we copy the key that is currently being merged and when the merged value is returned from the merge function in a Cow::Borrowed which means that it is borrowed from one of the arguments, it is therefore mandatory to copy it in a Vec<u8> inside of the Merger struct.

We removed all of the memcpy of the values that we want to merge, instead of storing them in a Vec of Cow<'static, [u8]> and therefore always calling value.to_vec() and storing them in a Cow::Owned. We now always store the values in a Cow::Borrowed avoiding the to_vec call.

I am not sure to understand why, but the miri CI passes now, strange but good 🤪