Open mdesmedt opened 3 years ago
Rayon has implicit "recursion" due to work stealing. That is, whenever a rayon thread is blocked on the result from another rayon thread, it will look for other pending work to do in the meantime. That stolen work is executed directly from the same stack where it was blocked.
Your par_iter().for_each()
becomes a bunch of nested join
s, and each one of those may block if one half gets stolen to a new thread. Since stealing is somewhat random, the pool will have a mix of stolen join
s and new spawn
s creating even more join
s, and I can definitely see how that might get out of control. You're not doing anything wrong, but I'm not sure how to tame that.
The new in_place_scope
in #844 would probably help, assuming you start from outside the thread pool. Then each spawn
will go into the pool's external queue, whereas scope
runs in the pool and pushes spawn
s into a thread's local queue. The local thread queues have priority over the external queue when it comes to work stealing, so that would let you prioritize finishing current par_iter
s before starting new spawn
s, for a bit less of a task explosion.
Maybe we could also have some sort of spawn_external
as a way of indicating lower priority, where we push to the external queue even if you call this from within the thread pool.
for block in 0..num_blocks { s.spawn(move |_| { let num_pixels: u32 = block; (0..num_pixels).into_par_iter().for_each(|index| { do_work(index); }); // par_iter over pixels }); // spawn } // loop blocks
BTW, what do you mean by this "loop blocks" comment? The spawn
s do not wait for completion here, so the loop will queue up all num_blocks
iterations in rapid succession. Only the end of the scope
will block for all of the spawn
s.
for block in 0..num_blocks { s.spawn(move |_| { let num_pixels: u32 = block; (0..num_pixels).into_par_iter().for_each(|index| { do_work(index); }); // par_iter over pixels }); // spawn } // loop blocks
BTW, what do you mean by this "loop blocks" comment? The
spawn
s do not wait for completion here, so the loop will queue up allnum_blocks
iterations in rapid succession. Only the end of thescope
will block for all of thespawn
s.
Hah. It's not a "block" in terms of concurrency :). I derived this snippet from my little "Raytracing in one weekend" implementation. Where I generate a list of "renderblocks" to render in FIFO order. Complete overkill really for a bunch of spheres, but it looks pretty.
Thanks for your explanation. It seems like this potential for stack explosions is inherent to the current work stealing architecture of rayon. As I hit this in my code "in production" so to speak a mitigation or alternative way of expressing the work so hitting this won't happen would be great.
Oh of course, you mean each "block" in 0..num_blocks
. :sweat_smile:
Here is a simple piece of code which causes a stack overflow with rayon 1.5.0:
The callstack repeats this pattern:
I'm fairly new to Rust and rayon so it might be what I'm doing is wrong, but as my code isn't obviously recursing I wouldn't expect rayon internally to recurse either. This behavior poses a danger to users of the lib as it might work in trivial cases but panic in others.