Open bergkvist opened 4 years ago
The existing chunks
might work for you, but maybe you don't want the Vec
allocation for each chunk of items.
I guess I could see this as a new method on IndexedParallelIterator
, folding in a custom way rather than just into a Vec
. I would give that a signature more like our fold
:
fn fold_chunks<T, ID, F>(self, chunk_size: usize, identity: ID, fold_op: F) -> Fold<Self, ID, F>
where
F: Fn(T, Self::Item) -> T + Sync + Send,
ID: Fn() -> T + Sync + Send,
T: Send,
A corresponding fold_with
-> fold_chunks_with
would make sense too.
The signature you gave, with |left: T, right: T| -> T
, looks more like what we would call a reduce
. Maybe we could also support a chunk/group version of reduce
, producing another parallel iterator instead of the final item.
I stumbled across this FR after implementing something similar to this a while back, I've brought it up to date here: https://github.com/willcrozi/rayon/tree/fold_chunks
My use case was grouping strings for output, avoiding the extra allocations associated with chunks
.
I think it might be slightly different to the above: my fold_chunks
is a parallel iterator that yields the sequential fold of each chunk and it's implemented it as a Producer
(I used chunks
for reference). I guess a chained call to fold
would make it equivalent?
If there's interest I'd be happy to modify as needed, then open a PR once I know I'm on the right track (also adding fold_chunks_with
).
Also, would it make sense for a try_fold_chunks
for fallible ops since the given use case is IO based?
Problem:
Potential solution:
Performing 7000 * 25 IO append operations becomes my bottleneck. What if some of these writes could be grouped together in memory first?
Basically, given a group size, an associative fold operation, we could combine individual par_iter-items into groups. Assuming a group size of 25, the number of IO write operations would be reduced from 7000*25 to 7000.