rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
99.2k stars 12.81k forks source link

array::map stack usage #86912

Open programmerjake opened 3 years ago

programmerjake commented 3 years ago

I don't know if this is the right place for this, but I'd like to report that array::map uses the stack excessively and operating on larger arrays result in unexpected stack overflows.

Minimal example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=2d96e8f66b3432a8d1ecf0ca4331f85a

In the example, at some point more than 16 times more stack is used than the original array size.

As far as I've looked into the actual code, the only thing that stood out to me is use of core::mem::transmute_copy instead of core::mem::transmute in core::array::IntoIter::new, but I doubt it's the reason for this.

Originally posted by @PonasKovas in https://github.com/rust-lang/rust/issues/75243#issuecomment-874732134

programmerjake commented 3 years ago

I did some analysis for the similar code: https://rust.godbolt.org/z/vssb3dEqx

#![feature(array_map)]

pub fn main() {
    (|| {
        [0u8; 128 * 1024].map(|e| e + 1);
        // Stack overflows, meaning the stack usage was more than 16 times the size of the array!
    })();
}

After some sed magic (I'm on my phone and rustup can't find any toolchains for aarch64-linux-android)...
Functions sorted by stack usage:

917528 core::ptr::read_unaligned::<[core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
655384 core::ptr::read::<[u8; 131072]>
655384 core::ptr::read::<[core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
655368 <core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>>::new
655368 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::map::<u8, example::main::{closure#0}::{closure#0}>
393224 core::array::collect_into_array_unchecked::<core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>, 131072>
131096 core::mem::forget::<[u8; 131072]>
131088 example::main::{closure#0}
56 core::mem::transmute_copy::<[u8; 131072], [core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
56 <usize as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked
56 <core::option::Option<usize>>::map::<u8, <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next::{closure#0}>
56 <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked_mut
56 <core::ops::range::Range<usize> as core::iter::range::RangeIteratorImpl>::spec_next
40 core::ptr::slice_from_raw_parts_mut::<u8>
40 core::ptr::slice_from_raw_parts_mut::<core::mem::maybe_uninit::MaybeUninit<u8>>
40 <core::option::Option<u8>>::map::<u8, &mut example::main::{closure#0}::{closure#0}>
40 <core::array::iter::IntoIter<u8, 131072>>::as_mut_slice
40 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next
40 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked_mut::<core::ops::range::Range<usize>>
32 core::ptr::read::<usize>
32 core::ptr::metadata::from_raw_parts_mut::<[u8]>
32 core::ptr::metadata::from_raw_parts_mut::<[core::mem::maybe_uninit::MaybeUninit<u8>]>
32 <usize as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked_mut
24 core::ptr::read::<u8>
24 <core::ops::range::Range<usize> as core::iter::traits::iterator::Iterator>::next
24 <core::mem::maybe_uninit::MaybeUninit<[u8; 131072]>>::zeroed
24 <core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}> as core::iter::traits::iterator::Iterator>::next
24 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next::{closure#0}
24 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked_mut::<usize>
24 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked::<usize>
16 core::mem::forget::<core::array::collect_into_array::Guard<u8, 131072>>
16 <usize as core::iter::range::Step>::forward_unchecked
8 example::main::{closure#0}::{closure#0}
8 example::main
8 core::ptr::write::<usize>
8 core::ptr::drop_in_place::<core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>>
8 core::ptr::drop_in_place::<core::array::iter::IntoIter<u8, 131072>>
8 core::ptr::drop_in_place::<core::array::collect_into_array::Guard<u8, 131072>>
8 core::intrinsics::write_bytes::<[u8; 131072]>
8 <core::array::iter::IntoIter<u8, 131072> as core::ops::drop::Drop>::drop
8 <core::array::collect_into_array::Guard<u8, 131072> as core::ops::drop::Drop>::drop
8 <*const u8>::read
8 <*const [u8; 131072]>::read
8 <&mut example::main::{closure#0}::{closure#0} as core::ops::function::FnOnce<(u8,)>>::call_once
0 core::hint::unreachable_unchecked
0 <usize as core::cmp::PartialOrd>::lt
0 <usize as core::clone::Clone>::clone
0 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::as_mut_ptr
0 <*const [core::mem::maybe_uninit::MaybeUninit<u8>]>::as_ptr
PonasKovas commented 3 years ago

Someone analyzed a similar problem on reddit: https://www.reddit.com/r/rust/comments/oeqqf7/unexpected_high_stack_usage/

nagisa commented 3 years ago

Alas, the high stack usage when using large types by value, including arrays, is going to be a commonly recurring theme. There's a lot of reliance on optimization here to remove intermediate locals and to inline functions so that copies when passing arguments aren't necessary.

wrenger commented 2 years ago

The implementation of array::map seams fairly complex. Maybe a more trivial solution is optimized better? Something like that:

pub fn map<F, U>(self, mut f: F) -> [U; N]
where
   F: FnMut(T) -> U,
{
    let mut result: [U; N] = unsafe { std::mem::zeroed() };
    for (i, x) in self.into_iter().enumerate() {
        result[i] = f(x);
    }
    result
}

(using the 2021 edition into_iter)

newpavlov commented 1 year ago

The issue seems to be resolved on Nigthly.

programmerjake commented 1 year ago

not necessarily, the stack usage went from >16x the array size to >8x the array size, which imho is still excessive.

https://play.rust-lang.org/?version=nightly&mode=debug&edition=2021&gist=e29ee71b25956cfd144f8aa258298740

newpavlov commented 1 year ago

You are using debug mode. Can you create an example which would work in release mode? Ideally it should be a godbolt link with enabled optimizations which clearly demonstrates unnecessary stack usage.

newpavlov commented 1 year ago

@wrenger The complexity comes from panic safety, i.e. the code needs to properly clean initialized elements if f panics and U implements Drop.

programmerjake commented 1 year ago

here's 3x stack usage in release mode: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=399ee5cf1e40d4c1be0181971fdd1b25

ideally it would be 2x if neither the source nor destination array of the map were removed.

programmerjake commented 1 year ago

compiler explorer demo of 3x stack usage: https://rust.godbolt.org/z/zdYh9vxMa

array size of 100kB:

example::func:
        push    r15
        push    r14
        push    r12
        push    rbx
        mov     eax, 300024
        call    __rust_probestack
        sub     rsp, rax // <-- reserve 300k of stack space
        ...
newpavlov commented 1 year ago

I think this link can be a slightly better demonstration: https://rust.godbolt.org/z/Wd7Ee8fzG

We can clearly see that the map-based code for some reason does the following: 1) Copy input array from reference to stack. 2) Perform mapping from the copied array to another stack region. 3) Copy the resulted data to another stack region again and pass reference to it to the extern function.

There are no side effects which prevent compiler from properly optimizing this code.

Somewhat relevant issue: #88930

khoover commented 1 year ago

I have an example here where stack space for map is always 2*size_of::<[T; N]>() + size_of::<[U; N]>().