Splitting strategy for `find_first` and `find_last` methods

I find ParallelIterator's find_first/find_map_first (and find_last/find_map_last) methods rather curious: since they necessarily require that every item between the start (or end) of the iterator and the returned item be consumed, and they need never consume any item beyond that returned item, rayon's current splitting strategy seems rather wasteful.

By way of demonstration, consider the following trivial example (playground):

use rayon::prelude::*;
use std::sync::atomic::{AtomicUsize, Ordering};

fn main() {
    let attempts = AtomicUsize::default();

    (0..=u32::MAX).into_par_iter().find_first(|&n| {
        attempts.fetch_add(1, Ordering::AcqRel);
        n == 1000000
    });

    println!("{}", attempts.load(Ordering::SeqCst));
}

As I understand it, at each "split" rayon currently divides the iterator in half about its midpoint: consequently, until the resulting element has been consumed, work may be performed consuming elements to the right of it—and that work is entirely wasted. Indeed in my case, a 16-thread pariter is considerably underperforming a single-threaded sequential iteration, consuming in total around 16 times more elements before arriving at a result: effectively all but one thread's work was wasted, and the overhead was considerable.

Surely it would be better to split such iterators according to a "step length": e.g. splitting an iterator that steps by s elements could yield iterators that each step by 2s elements but that start from offsets of 0 and s respectively?

I don't know rayon very well, so am not sure whether such a strategy is suited to this library nor indeed how it would be implemented—but if a PR implementing it would be welcome, I'm happy to dive into it.

rayon-rs / rayon

Splitting strategy for `find_first` and `find_last` methods #908