Open programmerjake opened 3 years ago
I did some analysis for the similar code: https://rust.godbolt.org/z/vssb3dEqx
#![feature(array_map)]
pub fn main() {
(|| {
[0u8; 128 * 1024].map(|e| e + 1);
// Stack overflows, meaning the stack usage was more than 16 times the size of the array!
})();
}
After some sed
magic (I'm on my phone and rustup can't find any toolchains for aarch64-linux-android
)...
Functions sorted by stack usage:
917528 core::ptr::read_unaligned::<[core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
655384 core::ptr::read::<[u8; 131072]>
655384 core::ptr::read::<[core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
655368 <core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>>::new
655368 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::map::<u8, example::main::{closure#0}::{closure#0}>
393224 core::array::collect_into_array_unchecked::<core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>, 131072>
131096 core::mem::forget::<[u8; 131072]>
131088 example::main::{closure#0}
56 core::mem::transmute_copy::<[u8; 131072], [core::mem::maybe_uninit::MaybeUninit<u8>; 131072]>
56 <usize as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked
56 <core::option::Option<usize>>::map::<u8, <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next::{closure#0}>
56 <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked_mut
56 <core::ops::range::Range<usize> as core::iter::range::RangeIteratorImpl>::spec_next
40 core::ptr::slice_from_raw_parts_mut::<u8>
40 core::ptr::slice_from_raw_parts_mut::<core::mem::maybe_uninit::MaybeUninit<u8>>
40 <core::option::Option<u8>>::map::<u8, &mut example::main::{closure#0}::{closure#0}>
40 <core::array::iter::IntoIter<u8, 131072>>::as_mut_slice
40 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next
40 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked_mut::<core::ops::range::Range<usize>>
32 core::ptr::read::<usize>
32 core::ptr::metadata::from_raw_parts_mut::<[u8]>
32 core::ptr::metadata::from_raw_parts_mut::<[core::mem::maybe_uninit::MaybeUninit<u8>]>
32 <usize as core::slice::index::SliceIndex<[core::mem::maybe_uninit::MaybeUninit<u8>]>>::get_unchecked_mut
24 core::ptr::read::<u8>
24 <core::ops::range::Range<usize> as core::iter::traits::iterator::Iterator>::next
24 <core::mem::maybe_uninit::MaybeUninit<[u8; 131072]>>::zeroed
24 <core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}> as core::iter::traits::iterator::Iterator>::next
24 <core::array::iter::IntoIter<u8, 131072> as core::iter::traits::iterator::Iterator>::next::{closure#0}
24 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked_mut::<usize>
24 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::get_unchecked::<usize>
16 core::mem::forget::<core::array::collect_into_array::Guard<u8, 131072>>
16 <usize as core::iter::range::Step>::forward_unchecked
8 example::main::{closure#0}::{closure#0}
8 example::main
8 core::ptr::write::<usize>
8 core::ptr::drop_in_place::<core::iter::adapters::map::Map<core::array::iter::IntoIter<u8, 131072>, example::main::{closure#0}::{closure#0}>>
8 core::ptr::drop_in_place::<core::array::iter::IntoIter<u8, 131072>>
8 core::ptr::drop_in_place::<core::array::collect_into_array::Guard<u8, 131072>>
8 core::intrinsics::write_bytes::<[u8; 131072]>
8 <core::array::iter::IntoIter<u8, 131072> as core::ops::drop::Drop>::drop
8 <core::array::collect_into_array::Guard<u8, 131072> as core::ops::drop::Drop>::drop
8 <*const u8>::read
8 <*const [u8; 131072]>::read
8 <&mut example::main::{closure#0}::{closure#0} as core::ops::function::FnOnce<(u8,)>>::call_once
0 core::hint::unreachable_unchecked
0 <usize as core::cmp::PartialOrd>::lt
0 <usize as core::clone::Clone>::clone
0 <[core::mem::maybe_uninit::MaybeUninit<u8>]>::as_mut_ptr
0 <*const [core::mem::maybe_uninit::MaybeUninit<u8>]>::as_ptr
Someone analyzed a similar problem on reddit: https://www.reddit.com/r/rust/comments/oeqqf7/unexpected_high_stack_usage/
Alas, the high stack usage when using large types by value, including arrays, is going to be a commonly recurring theme. There's a lot of reliance on optimization here to remove intermediate locals and to inline functions so that copies when passing arguments aren't necessary.
The implementation of array::map
seams fairly complex. Maybe a more trivial solution is optimized better?
Something like that:
pub fn map<F, U>(self, mut f: F) -> [U; N]
where
F: FnMut(T) -> U,
{
let mut result: [U; N] = unsafe { std::mem::zeroed() };
for (i, x) in self.into_iter().enumerate() {
result[i] = f(x);
}
result
}
(using the 2021 edition into_iter
)
The issue seems to be resolved on Nigthly.
not necessarily, the stack usage went from >16x the array size to >8x the array size, which imho is still excessive.
You are using debug mode. Can you create an example which would work in release mode? Ideally it should be a godbolt link with enabled optimizations which clearly demonstrates unnecessary stack usage.
@wrenger
The complexity comes from panic safety, i.e. the code needs to properly clean initialized elements if f
panics and U
implements Drop
.
here's 3x stack usage in release mode: https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=399ee5cf1e40d4c1be0181971fdd1b25
ideally it would be 2x if neither the source nor destination array of the map
were removed.
compiler explorer demo of 3x stack usage: https://rust.godbolt.org/z/zdYh9vxMa
array size of 100kB:
example::func:
push r15
push r14
push r12
push rbx
mov eax, 300024
call __rust_probestack
sub rsp, rax // <-- reserve 300k of stack space
...
I think this link can be a slightly better demonstration: https://rust.godbolt.org/z/Wd7Ee8fzG
We can clearly see that the map-based code for some reason does the following: 1) Copy input array from reference to stack. 2) Perform mapping from the copied array to another stack region. 3) Copy the resulted data to another stack region again and pass reference to it to the extern function.
There are no side effects which prevent compiler from properly optimizing this code.
Somewhat relevant issue: #88930
I have an example here where stack space for map
is always 2*size_of::<[T; N]>() + size_of::<[U; N]>()
.
make_terrain
and make_terrain_zeroed
both use map
, and both exhibit this behaviour (make_terrain_zeroed
was checking whether the problem was in map
or MaybeUninit::assume_init
's move not being elided properly).make_terrain_nomap
uses a for i in 0..N
loop, and also exhibits the extra stack space being used (I think the elision of assume_init
's move isn't happening there, separate issue)make_terrain_nomap_ref
replaces assume_init
with assume_init_ref
and achieves the desired generic stack usage of size_of::<[T; N]>() + size_of::<[U; N]>()
make_terrain_inplace
is a specialization I'd like to see happen for U: !Drop, T: !Drop, size_of::<T>() == size_of::<U>(), align_of::<T>() == align_of::<U>()
whenever specialization is ready (or at least U: Copy, T: Copy
for now)
I don't know if this is the right place for this, but I'd like to report that
array::map
uses the stack excessively and operating on larger arrays result in unexpected stack overflows.Minimal example: https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=2d96e8f66b3432a8d1ecf0ca4331f85a
In the example, at some point more than 16 times more stack is used than the original array size.
As far as I've looked into the actual code, the only thing that stood out to me is use of
core::mem::transmute_copy
instead ofcore::mem::transmute
incore::array::IntoIter::new
, but I doubt it's the reason for this.Originally posted by @PonasKovas in https://github.com/rust-lang/rust/issues/75243#issuecomment-874732134