Open orlp opened 4 months ago
And in case anyone wonders, the exact same bad code is generated if one uses arrayvec::ArrayVec
rather than my ArrayBuilder
:
pub fn next_array<I, T, const N: usize>(it: &mut I) -> Option<[T; N]>
where
I: Iterator<Item = T>,
{
let mut builder = arrayvec::ArrayVec::new();
for _ in 0..N {
builder.push(it.next()?);
}
builder.into_inner().ok()
}
You could look at what next_chunk
does in std, it uses MaybeUninit
internally. And Copied
has some specialization on top.
Though itertool's job actually should be easier here since you're just returning an Option
instead of a Result
with a remainder.
I reduced to the following code, but this is another issue:
#[no_mangle]
pub fn src(sl: &[u32]) -> Option<u32> {
let mut r = 0;
let it = &mut sl.iter();
for _ in 0..2 {
r += it.next()?;
}
Some(r)
}
#[no_mangle]
pub fn tgt(sl: &[u32]) -> Option<u32> {
let mut r = 0;
let it = &mut sl.iter();
r += it.next()?;
r += it.next()?;
Some(r)
}
While working on
Itertools::collect_array
foritertools
I wanted to compare the efficiency ofYou can see my comparison here: https://rust.godbolt.org/z/qPW3K8aTx.
I was rather shocked, both look good for
N = 4
, but forN = 16
we see the following nice implementation fortry_into
:But the following abomination for
collect_array
:While it not folding the consecutive address
mov
s into efficient SIMDmov
s is disappointing, I would argue that there's probably a bug somewhere since it comparesrdx
THIRTEEN TIMES IN A ROW to ultimately just check if it is< 16
.This doesn't appear to be a recent regression, the same happens in 1.60 through nightly, and it happens on both x86 as well as ARM.