Open awused opened 1 month ago
Is it possible that the amount of work you spawn (5 items) is so small that it is all processed on the main thread. Please try explicitly spawning work using e.g. Scope::spawn
or just processing more work? (In the first case, Rayon is forced to spawn at least one task which itself is probably slow enough so that it gives the other threads a chance to steal at least some work.)
Alternatively, just to better understand this, you could run your whole program in a named thread so that the main thread can be identified by name.
I think you misunderstand the problem; the amount of tasks or the name of the main thread does not matter. The confusion is that parallel iterators are not counted as "work that it spawns" and they're running on the global threadpool instead of the threadpool used for scope_in_place
. This is different from scope
or install
, and is not clear in the documentation which actually seems to indicate the opposite, that work from parallel iterators would be run on the threadpool used in scope_in_place
.
Using Scope::spawn does the work in the threadpool, but that was never in question.
From reading the documentation on scope_in_place
, I think a reasonable person would assume that the program I posted would not print Running in thread None
at any point in time. Whether the tasks were running on the global default rayon threadpool or the main thread isn't really relevant. To be clear, they were running on the default global threadpool - but the surprise is that they were not running in my pool
as I expected.
the amount of tasks or the name of the main thread does not matter. The confusion is that parallel iterators are not counted as "work that it spawns" and they're running on the global threadpool instead of the threadpool used for scope_in_place.
I think the output you posted can have an alternative explanation, i.e. working stealing just never happens (it is opportunistic that way, if the processing is already done before any stealing is possible, it does not happen) and hence all of your work happens on the main thread (not in the global thread pool).
Both explanations would result in the same output and I would like to first clarify what exactly happens here.
With a quick global threadpool it's easy to illustrate it.
ThreadPoolBuilder::new()
.thread_name(|u| format!("global thread {u}"))
.build_global()
.unwrap();
I get output I expect now - but only after digging into rayon's internals and historical bug reports. Notably this seems to contradict the documentation that "work spawned" inside scope_in_place
is run on pool
, which is why I think the documentation is unclear/confusing/wrong.
Running in thread Some("global thread 7")
Running in thread Some("global thread 12")
Running in thread Some("global thread 21")
Running in thread Some("global thread 5")
Running in thread Some("global thread 17")
Ok, so I tried my suggestion of naming the outer thread and indeed work stealing happens and the jobs do run on the global thread pool, not just on the main thread. So I would say this is bug in in_place_scope
, not just a documentation issue and we just need to ensure we temporarily install the given pool around the closure.
I thought what was actually happening wasn't hard to understand
I would be glad if you could stick to the problem instead of making slurry suggestions as to what is difficult to understand and what is not.
Oh, that's an even better resolution than I expected. Thanks.
I would be glad if you could stick to the problem instead of making slurry suggestions as to what is difficult to understand and what is not.
Yeah, fair.
I think there may be room for an API that overrides the "current" pool for the duration of a callback, if we can do that without too much overhead. Maybe #1166 is that, but I'll have to test it.
However, that's not my expectation for what in_place_scope
should do. When the docs say, "Only work that it spawns runs in the thread pool,", I think should mean only actual spawns through the given Scope
object go to that pool, not all rayon activity in the closure, but I can see how the current wording can also be read to imply the latter.
For a use case, consider if the alternate pool has a tweaked priority, for example. So with the existing behavior, you can do something like this:
alt.in_place_scope(|s| {
s.spawn(|_| high_priority_work()); // send to the alt pool
stuff.par_iter().for_each(|x| regular_work(x)); // on the current "normal" pool
});
If we accept that this is useful, then primary fix here is just doc clarification, but a way to override the current pool can still be considered as a separate enhancement. In the original example here, pool.scope(|_| {...})
that ignores the Scope
could be better written as pool.install(|| {...})
. Maybe the new API would look like pool.with_current(|| {...})
.
For a use case, consider if the alternate pool has a tweaked priority, for example. So with the existing behavior, you can do something like this:
Personally, I would find it clearer to explicitly install the stuff.par_iter()
work in a different pool that is unrelated to the scope instead of relying on the "ambient" pool to reach into the scope, e.g.
in_place_scope(|outer| {
alt.in_place_scope(|inner| {
inner.spawn(|_| high_priority_work());
outer.install(|| stuff.par_iter().for_each(|x| regular_work(x)));
});
});
We don't have Scope::install
, but we could...
Your example gets quite deep on closures, and it forces stuff
to be thread safe, which isn't necessarily so otherwise -- e.g. IndexMap<K, V, S>
implements parallel iterators regardless of S
.
Your proposed fix is also adding to the magic of the "ambient" pool. Implicit behavior has tradeoffs everywhere!
I guess my overall feeling is that it is simpler to implement and explain that in_place_scope
only affects operations through the callback Scope
parameter, with no other implicit change. That implementation is done, and it shouldn't take much to clarify that in the docs.
Your example gets quite deep on closures, and it forces stuff to be thread safe, which isn't necessarily so otherwise -- e.g. IndexMap<K, V, S> implements parallel iterators regardless of S.
Indeed, I was trying for something which works with the existing API but generally I think we are missing an API to capture the current "ambient" pool and use that explicitly elsewhere. I can explicitly use a pool or I can implicitly use the global pool, but it seems hard to consistently use the global pool (or an otherwise inherited pool) explicitly.
Your proposed fix is also adding to the magic of the "ambient" pool. Implicit behavior has tradeoffs everywhere!
As indicated by the test case, my main "surprise" here is that scope.spawn
and crate::spawn
would do different things inside in_place_scope
in contrast to scope
. We can call that out in the docs, but I don't think it is an obvious interpretation of the ambient capability.
I pushed a trivial API extension to #1166
impl ThreadPool {
pub fn current() -> Self {
Self {
registry: Registry::current(),
}
}
pub fn with_current<F, R>(&self, f: F) -> R
where
F: FnOnce() -> R,
{
Registry::with_current(Some(&self.registry), f)
}
}
which allows reifying the ambient capability and temporarily changing the current pool without "installation", i.e. the above example would then look like
let default_pool = ThreadPool::current();
high_priority_pool.in_place_scope(|scope| {
scope.spawn(|_| high_priority_work());
default_pool.with_current(|| {
items.par_iter().for_each(|item| regular_work(item));
})
});
I was playing around with in_place_scope to try to offload some work while not blocking the main threadpool, and not blocking the worker pool with long-running main pool tasks. From reading the documentation, I thought
in_place_scope
would be a drop in replacement for scope with the behaviour I wanted.Running this code produces surprising output that, before digging deeper, appears to contradict the documentation.
I initially expected both runs to produce the same output, modulo the specific thread numbers, but I was surprised to find they ran their tasks on different pools.
From reading other issues around
in_place_scope
it's probably infeasible to change how it functions, but I was surprised by the current behaviour even after I'd read the documentation. The documentation is at least unclear if you're not aware of the internals of rayon:This is just like scope() except the closure runs on the same thread that calls in_place_scope(). Only work that it spawns runs in the thread pool.
Reading that, it's not immediately obvious that parallel iterators don't count as "spawning" and onlyscope.spawn()
calls count, and it's definitely a subtle distinction betweenin_place_scope
andinstall
/scope
.