meilisearch / grenad

Tools to sort, merge, write, and read immutable key-value pairs :tomato:
https://docs.rs/grenad
MIT License
25 stars 3 forks source link

Add the possibilitie to merge several sorters easily without having to convert them into readers #44

Closed ManyTheFish closed 1 year ago

ManyTheFish commented 1 year ago

Today the easy way to merge several sorters is to use the method write_into_stream_writer to merge each sorter into a file, then, create a merger pushing each file into it. This means we have to make several merges in a row to merge several sorters.

Proposal

1) Add a method into_merger in the struct Sorter that creates a Merger by consuming the Sorter 2) Allow the MergerBuilder to take in parameter one or several Mergers or allow unions between Mergers.

Basic usage

// creates the first sorter and converts it into a merger.
let sorter_1: Sorter<MF, _> = Sorter::new(merge_func);
let merger_1: Merger<_, MF> = sorter.into_merger();

// creates the second sorter and converts it into a merger.
let sorter_2: Sorter<MF, _> = Sorter::new(merge_func);
let merger_2: Merger<_, MF> = sorter.into_merger();

// A: create a MergerBuilder and push each Merger
let mut builder: MergerBuilder<_, MF> = MergerBuilder::new(merge_func);
builder.push_merger(merger_1);
builder.push_merger(merger_2);

let merger_1_2: Merger<_, MF> = builder.build();

// B: directly merge Mergers by using a union method
let merger_1_2: Merger<_, MF> = merger_1.union_with(merger_2);

// Write everything into one unique writer:
merger_1_2.write_into_stream_writer(&mut writer)?;
Kerollmops commented 1 year ago

I have recently worked on a similar way to merge multiple Sorter together by doing a single operation. You can see the Sorter::into_reader_cursors method in https://github.com/meilisearch/grenad/pull/41. It hasn't been released yet as I want more feedback before releasing a new Grenad version.