Open scottmcm opened 1 month ago
Emm, it seems like this pattern is rare. Do you encounter this in any real-world rust applications?
Maybe start by getting sub nuw for this? Like
Yeah, we can handle this pattern in ValueTracking like https://github.com/llvm/llvm-project/pull/85555.
Do you encounter this in any real-world rust applications?
Ah, here we go, I remade what I'd been trying to do: https://rust.godbolt.org/z/TczW1Ka1d
I was trying to get same-length mutable slices to the front and the back of a rust slice, so I could walk them together and mutate them both. (And yes I was intentionally allowing the skipping of the middle element for an odd-length slice.) Doing that in safe rust code means going through split_at_mut
, and thus I ended up at
pub fn front_and_back_halves(x: &mut [i32]) -> (&mut [i32], &mut [i32]) {
let n = x.len();
let (a, b) = x.split_at_mut(n - n/2);
(&mut a[..n/2], b)
}
expecting that I wouldn't need unsafe
code because LLVM would handle it, but it turned out that the a[..n/2]
bounds check doesn't optimize out:
define void @front_and_back_halves(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([32 x i8]) align 8 dereferenceable(32) %_0, ptr noalias noundef nonnull align 4 %x.0, i64 noundef %x.1) unnamed_addr #0 {
start:
%_41 = lshr i64 %x.1, 1
%_3 = sub i64 %x.1, %_41
%_7.i = icmp ugt i64 %_41, %_3
br i1 %_7.i, label %bb3.i, label %"_ZN106_$LT$core..ops..range..Range$LT$usize$GT$$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$9index_mut17h7c9c67143f1b9d5cE.exit"
bb3.i: ; preds = %start
tail call void @_ZN4core5slice5index24slice_end_index_len_fail17h07f57a356f69bc93E(i64 noundef %_41, i64 noundef %_3, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_43779ed79010680daba275a097b13dc0) #2, !noalias !3
unreachable
"_ZN106_$LT$core..ops..range..Range$LT$usize$GT$$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$9index_mut17h7c9c67143f1b9d5cE.exit": ; preds = %start
%0 = getelementptr inbounds i32, ptr %x.0, i64 %_3
store ptr %x.0, ptr %_0, align 8
%1 = getelementptr inbounds i8, ptr %_0, i64 8
store i64 %_41, ptr %1, align 8
%2 = getelementptr inbounds i8, ptr %_0, i64 16
store ptr %0, ptr %2, align 8
%3 = getelementptr inbounds i8, ptr %_0, i64 24
store i64 %_41, ptr %3, align 8
ret void
}
And it's not enough to just make the n - n/2
be sub nuw
: https://rust.godbolt.org/z/PT3hnbe8o
So I agree that humans probably don't write this comparison much, but it easily happens in bounds checks.
(Similarly, split_at_mut(n/2)
would also give n/2
- and n - n/2
-length slices.)
Out of curiosity, I also tried https://rust.godbolt.org/z/4o366q4s1
let (a, b) = x.split_at_mut(n/2);
(a, &mut b[n%2..])
But that seems much worse because despite still having a bounds check it also doesn't realize that the lengths of the two returned slices are the same. (As evidenced by writing out two different SSA values as the lengths of the returned slices. In the earlier example note that both lengths are store i64 %_41
because LLVM at least knows they're the same length, even if the bounds check didn't optimize out.)
How about the x / 2 <= (x + 1) / 2
case? When it come to "half but rounded up", I would write the expression as this: ((x + 1) / 2)
@Explorer09 x.len()
in rust can be usize::MAX
(albeit only for slices of ZSTs), which would make doing it as (x + 1)/2
wrap.
That inspired me to try .split_at_mut(n/2 + n%2);
too, but nope (https://rust.godbolt.org/z/Y96EzdcGh) that didn't optimize out either.
EDIT: Oh, but it looks like trunk can simplify that one https://llvm.godbolt.org/z/s5hPGbbnP -- amusingly, it simplifies it to n - n/2
🙃
@scottmcm Good point. Thanks for in the info about the SIZE_MAX
where (x + 1) / 2
won't work (because of overflow).
I stand corrected.
Actually, I was thinking about your message more and thought that I should try doing that (x + 1) / 2
approach in a larger type to see if it would work that way. After all, if I zext
to a larger type where I can add nuw
, it ought to be more obvious that x/2 <= (x+1)/2
-- definitely more obvious than n/2 <= n - n/2
.
Sadly, no luck. While it would be legal (https://alive2.llvm.org/ce/z/cU7chM) to optimize the check away, it doesn't even though it figures out both add nuw
and trunc nuw
: https://llvm.godbolt.org/z/zMKW5PP7x
@scottmcm
I think another pattern that might me worth mentioning is this: (x / 2) + (x % 2)
(assuming x is an unsigned integer), or the bitwise equivalent (x >> 1) + (x & 1)
@Explorer09 yup, see https://github.com/llvm/llvm-project/issues/91527#issuecomment-2106704296 for that one :)
Emm, it seems like this pattern is rare.
Confirmed.
Today, this doesn't optimize away https://llvm.godbolt.org/z/jPjo6M858
but that could just be
return true
, as proven in https://alive2.llvm.org/ce/z/dxHhft.Indeed, it's always
true
for any denominator greater than zero. Proof: https://alive2.llvm.org/ce/z/CmkN_mThis is important for "half but it needs to be rounded up" cases.
x/2 <= x
is already understood to be true, but when you need the bigger half instead, the optimizer is more confused. (I lost the original non-minimized example, sorry, but it was something like splitting an n-element slice inton/2
andn-n/2
slices and ending up with the bounds checking not optimizing out.)Maybe start by getting
sub nuw
for this? LikeProof for that narrower part: https://alive2.llvm.org/ce/z/xoCDe5