Open JosephLenton opened 2 months ago
Hi, thanks for the idea.
I wonder if chunk_by
, batching
or (a generalized) coalesce
might do the job, too. Did you try these and see how far you get?
Related: https://github.com/rust-itertools/itertools/pull/436
One can achieve this using chunk_by
by counting generations:
let mut current_batch_size = 0;
let mut generation = 0;
let batches = data_to_send
.into_iter()
.chunk_by(move |data| {
let size = data.len();
if current_batch_size + size > MAX_BATCH_SEND_SIZE {
current_batch_size = 0;
generation += 1;
}
current_batch_size += size;
generation
});
I actually ended up writing this twice due to getting the logic wrong the first time. In practice I would end up putting this into a helper function, so I can wrap that with tests.
Hey, I would like to propose adding a method
batch_by
(I don't know if this is the best name, however I'll use it here).batch_by
would allow you to group elements by their size.try_batch_by
for dealing with Iterators ofResult
.The function signature would be something like:
(I'm not 100% sure on the need for
Num
. I used it in an implementation I wrote, however I believe it could probably be dropped.)Pseudo Example Code
Motivations
I have personally needed this on several projects when grouping things to be sent on to an external service. For example on a real world project, I needed batch data into 10mb groups to send to ElasticSearch.
Comments
AFAIK there is nothing like this in Itertools. There are ways to group by key, or to find the largest or smallest. A means to say 'put these into groups of 10mb or less' .
I have written this before, so if there is interest I would be more than happy to write a PR for Itertools.