`in_place_scope` documentation is confusing/unclear

awused commented 1 month ago

I was playing around with in_place_scope to try to offload some work while not blocking the main threadpool, and not blocking the worker pool with long-running main pool tasks. From reading the documentation, I thought in_place_scope would be a drop in replacement for scope with the behaviour I wanted.

Running this code produces surprising output that, before digging deeper, appears to contradict the documentation.

use rayon::iter::{IntoParallelRefIterator, ParallelIterator};
use rayon::ThreadPoolBuilder;

fn main() {
    let pool = ThreadPoolBuilder::new()
        .thread_name(|u| format!("pool thread {u}"))
        .build()
        .unwrap();

    let stuff = vec![0; 5];

    println!("Scope");
    pool.scope(|_| {
        stuff
            .par_iter()
            .for_each(|_| println!("Running in thread {:?}", std::thread::current().name()))
    });

    println!("Scope in place");
    pool.in_place_scope(|_| {
        stuff
            .par_iter()
            .for_each(|_| println!("Running in thread {:?}", std::thread::current().name()))
    });
}

I initially expected both runs to produce the same output, modulo the specific thread numbers, but I was surprised to find they ran their tasks on different pools.

Scope
Running in thread Some("pool thread 30")
Running in thread Some("pool thread 1")
Running in thread Some("pool thread 27")
Running in thread Some("pool thread 26")
Running in thread Some("pool thread 28")
Scope in place
Running in thread None
Running in thread None
Running in thread None
Running in thread None
Running in thread None

From reading other issues around in_place_scope it's probably infeasible to change how it functions, but I was surprised by the current behaviour even after I'd read the documentation. The documentation is at least unclear if you're not aware of the internals of rayon: This is just like scope() except the closure runs on the same thread that calls in_place_scope(). Only work that it spawns runs in the thread pool. Reading that, it's not immediately obvious that parallel iterators don't count as "spawning" and only scope.spawn() calls count, and it's definitely a subtle distinction between in_place_scope and install/scope.

adamreichold commented 1 month ago

Is it possible that the amount of work you spawn (5 items) is so small that it is all processed on the main thread. Please try explicitly spawning work using e.g. Scope::spawn or just processing more work? (In the first case, Rayon is forced to spawn at least one task which itself is probably slow enough so that it gives the other threads a chance to steal at least some work.)

adamreichold commented 1 month ago

Alternatively, just to better understand this, you could run your whole program in a named thread so that the main thread can be identified by name.

awused commented 1 month ago

I think you misunderstand the problem; the amount of tasks or the name of the main thread does not matter. The confusion is that parallel iterators are not counted as "work that it spawns" and they're running on the global threadpool instead of the threadpool used for scope_in_place. This is different from scope or install, and is not clear in the documentation which actually seems to indicate the opposite, that work from parallel iterators would be run on the threadpool used in scope_in_place.

Using Scope::spawn does the work in the threadpool, but that was never in question.

From reading the documentation on scope_in_place, I think a reasonable person would assume that the program I posted would not print Running in thread None at any point in time. Whether the tasks were running on the global default rayon threadpool or the main thread isn't really relevant. To be clear, they were running on the default global threadpool - but the surprise is that they were not running in my pool as I expected.

adamreichold commented 1 month ago

the amount of tasks or the name of the main thread does not matter. The confusion is that parallel iterators are not counted as "work that it spawns" and they're running on the global threadpool instead of the threadpool used for scope_in_place.

I think the output you posted can have an alternative explanation, i.e. working stealing just never happens (it is opportunistic that way, if the processing is already done before any stealing is possible, it does not happen) and hence all of your work happens on the main thread (not in the global thread pool).

Both explanations would result in the same output and I would like to first clarify what exactly happens here.

awused commented 1 month ago

With a quick global threadpool it's easy to illustrate it.

    ThreadPoolBuilder::new()
        .thread_name(|u| format!("global thread {u}"))
        .build_global()
        .unwrap();

I get output I expect now - but only after digging into rayon's internals and historical bug reports. Notably this seems to contradict the documentation that "work spawned" inside scope_in_place is run on pool, which is why I think the documentation is unclear/confusing/wrong.

Running in thread Some("global thread 7")
Running in thread Some("global thread 12")
Running in thread Some("global thread 21")
Running in thread Some("global thread 5")
Running in thread Some("global thread 17")

adamreichold commented 1 month ago

Ok, so I tried my suggestion of naming the outer thread and indeed work stealing happens and the jobs do run on the global thread pool, not just on the main thread. So I would say this is bug in in_place_scope, not just a documentation issue and we just need to ensure we temporarily install the given pool around the closure.

adamreichold commented 1 month ago

I thought what was actually happening wasn't hard to understand

I would be glad if you could stick to the problem instead of making slurry suggestions as to what is difficult to understand and what is not.

awused commented 1 month ago

Oh, that's an even better resolution than I expected. Thanks.

I would be glad if you could stick to the problem instead of making slurry suggestions as to what is difficult to understand and what is not.

Yeah, fair.

cuviper commented 1 month ago

I think there may be room for an API that overrides the "current" pool for the duration of a callback, if we can do that without too much overhead. Maybe #1166 is that, but I'll have to test it.

However, that's not my expectation for what in_place_scope should do. When the docs say, "Only work that it spawns runs in the thread pool,", I think should mean only actual spawns through the given Scope object go to that pool, not all rayon activity in the closure, but I can see how the current wording can also be read to imply the latter.

For a use case, consider if the alternate pool has a tweaked priority, for example. So with the existing behavior, you can do something like this:

alt.in_place_scope(|s| {
    s.spawn(|_| high_priority_work()); // send to the alt pool
    stuff.par_iter().for_each(|x| regular_work(x)); // on the current "normal" pool
});

If we accept that this is useful, then primary fix here is just doc clarification, but a way to override the current pool can still be considered as a separate enhancement. In the original example here, pool.scope(|_| {...}) that ignores the Scope could be better written as pool.install(|| {...}). Maybe the new API would look like pool.with_current(|| {...}).

adamreichold commented 1 month ago

For a use case, consider if the alternate pool has a tweaked priority, for example. So with the existing behavior, you can do something like this:

Personally, I would find it clearer to explicitly install the stuff.par_iter() work in a different pool that is unrelated to the scope instead of relying on the "ambient" pool to reach into the scope, e.g.

in_place_scope(|outer| {
   alt.in_place_scope(|inner| {
       inner.spawn(|_| high_priority_work());
       outer.install(|| stuff.par_iter().for_each(|x| regular_work(x)));
   });
});

cuviper commented 1 month ago

We don't have Scope::install, but we could...

Your example gets quite deep on closures, and it forces stuff to be thread safe, which isn't necessarily so otherwise -- e.g. IndexMap<K, V, S> implements parallel iterators regardless of S.

Your proposed fix is also adding to the magic of the "ambient" pool. Implicit behavior has tradeoffs everywhere!

cuviper commented 1 month ago

I guess my overall feeling is that it is simpler to implement and explain that in_place_scope only affects operations through the callback Scope parameter, with no other implicit change. That implementation is done, and it shouldn't take much to clarify that in the docs.

adamreichold commented 1 month ago

Your example gets quite deep on closures, and it forces stuff to be thread safe, which isn't necessarily so otherwise -- e.g. IndexMap<K, V, S> implements parallel iterators regardless of S.

Indeed, I was trying for something which works with the existing API but generally I think we are missing an API to capture the current "ambient" pool and use that explicitly elsewhere. I can explicitly use a pool or I can implicitly use the global pool, but it seems hard to consistently use the global pool (or an otherwise inherited pool) explicitly.

Your proposed fix is also adding to the magic of the "ambient" pool. Implicit behavior has tradeoffs everywhere!

As indicated by the test case, my main "surprise" here is that scope.spawn and crate::spawn would do different things inside in_place_scope in contrast to scope. We can call that out in the docs, but I don't think it is an obvious interpretation of the ambient capability.

adamreichold commented 1 month ago

I pushed a trivial API extension to #1166

impl ThreadPool {
    pub fn current() -> Self {
        Self {
            registry: Registry::current(),
        }
    }

    pub fn with_current<F, R>(&self, f: F) -> R
    where
        F: FnOnce() -> R,
    {
        Registry::with_current(Some(&self.registry), f)
    }
}

which allows reifying the ambient capability and temporarily changing the current pool without "installation", i.e. the above example would then look like

let default_pool = ThreadPool::current();

high_priority_pool.in_place_scope(|scope| {
    scope.spawn(|_| high_priority_work());

    default_pool.with_current(|| {
        items.par_iter().for_each(|item| regular_work(item));
    })
});

rayon-rs / rayon

`in_place_scope` documentation is confusing/unclear #1165