rayon-rs / rayon

Rayon: A data parallelism library for Rust
Apache License 2.0
10.89k stars 496 forks source link

Splitting strategy for `find_first` and `find_last` methods #908

Open eggyal opened 2 years ago

eggyal commented 2 years ago

I find ParallelIterator's find_first/find_map_first (and find_last/find_map_last) methods rather curious: since they necessarily require that every item between the start (or end) of the iterator and the returned item be consumed, and they need never consume any item beyond that returned item, rayon's current splitting strategy seems rather wasteful.

By way of demonstration, consider the following trivial example (playground):

use rayon::prelude::*;
use std::sync::atomic::{AtomicUsize, Ordering};

fn main() {
    let attempts = AtomicUsize::default();

    (0..=u32::MAX).into_par_iter().find_first(|&n| {
        attempts.fetch_add(1, Ordering::AcqRel);
        n == 1000000
    });

    println!("{}", attempts.load(Ordering::SeqCst));
}

As I understand it, at each "split" rayon currently divides the iterator in half about its midpoint: consequently, until the resulting element has been consumed, work may be performed consuming elements to the right of it—and that work is entirely wasted. Indeed in my case, a 16-thread pariter is considerably underperforming a single-threaded sequential iteration, consuming in total around 16 times more elements before arriving at a result: effectively all but one thread's work was wasted, and the overhead was considerable.

Surely it would be better to split such iterators according to a "step length": e.g. splitting an iterator that steps by s elements could yield iterators that each step by 2s elements but that start from offsets of 0 and s respectively?

I don't know rayon very well, so am not sure whether such a strategy is suited to this library nor indeed how it would be implemented—but if a PR implementing it would be welcome, I'm happy to dive into it.

wagnerf42 commented 2 years ago

hi,

it's a well known fact that the find algorithms are sub-optimal.

there is a way to get a better algorithm for indexed iterators by considering a sequence of blocks of increasing sizes where each block is searched in parallel.

in fact there is a pull request doing that here : https://github.com/rayon-rs/rayon/pull/831