rust-ndarray / ndarray

ndarray: an N-dimensional array with array views, multidimensional slicing, and efficient operations
https://docs.rs/ndarray/
Apache License 2.0
3.43k stars 295 forks source link

Stack Overflows on Zip indexed par_for_each #1346

Closed marstaik closed 6 months ago

marstaik commented 6 months ago

Hi, I am running a bunch of these various various tasks in a rayon threadpool. At some point a stack overflow in a thread becomes inevitable.

pub fn make_heightmap_test<T: Density + Send>(
    grid: &mut ArrayViewMut3<T>,
) {
    Zip::indexed(grid).par_for_each(|coords, value| {
        // stuff ...
    });
}

The last readable position I can get is deep in rayon, but in ndarray code

zip/mod.rs

    /// Return an *approximation* to the max stride axis; if
    /// component arrays disagree, there may be no choice better than the
    /// others.
    fn max_stride_axis(&self) -> Axis {
        let i = if self.prefer_f() {
            self
                .dimension
                .slice()
                .iter()
                .rposition(|&len| len > 1)
                .unwrap_or(self.dimension.ndim() - 1)
        } else {
            /* corder or default */
            self
                .dimension
                .slice()
                .iter()
                .position(|&len| len > 1)
                .unwrap_or(0)
        };
        Axis(i)
    }
}

Attached is a call stack. I can reproduce this quite consistently, but I am unsure of how to provide a dump in windows for rust.

call_stack.txt

Please let me know if I can provide more data.

adamreichold commented 6 months ago

Work stealing can generally lead to large stack depths. Is this for release builds (which use significantly less stack due to inlining and better space reuse)? Did you try to just increase the stack size of the worker threads using ThreadPoolBuilder::stack_size? This is necessary often enough just due to how Rayon's scheduler works.

marstaik commented 6 months ago

This was indeed on a release build. Setting the stack size to 24 megabytes for fun seems to have fixed it. Is there documentation on what the default stack size is? Or is it machine dependent? I haven't been able to find it just searching around.

adamreichold commented 6 months ago

AFAIK it is OS-dependant and I think on contemporary Linux, it is 8 MB for the main thread, c.f. https://unix.stackexchange.com/questions/127602/default-stack-size-for-pthreads

The main issue is that for the threads making up Rayon's thread pool, it is the much smaller default of 2 MB (which is controlled by Rust's std::thread module, c.f. https://doc.rust-lang.org/stable/std/thread/index.html#stack-size, i.e. it is much easier to hit this with Rayon than without it (both due to work stealing increasing usage and a smaller limit to start with).