Open scottmcm opened 6 years ago
In writing #50156 I found out that LLVM supports range metadata on call instructions. So you can have %x.2 = call i32 @NonZeroU32_get(i32 %x.1), !range !33
and it will use the range metadata to reason about the return value. If I add the range metadata manually to the IR for the above test case, the optimization indeed fires. So we can probably fix some of this (maybe it falls apart in more complex cases) by emitting range metadata for return values, which we currently don't. cc @eddyb
Filed #50157 for adding metadata to calls, but this issue is more general -- once the get()
gets inlined (or if it never gets called to begin with, e.g., because the u32 is accessed by type punning), we'd lose the information from the metadata on the call. The code in the OP is optimized because instcombine turns the compare into true
before inlining happens, but in more complex cases #50156 might also help (though that only helps if the NonZero*
is passed in as argument, it doesn't help with locals or if foo
is inlined).
Scratch that, range metadata on return values doesn't help with this issue since get
returns a plain old u32
. (thanks @nox)
With debuginfo, we emit an alloca for %x
that does have range metadata when loaded from. But it appears that is still not sufficiently propagated through optimization. https://rust.godbolt.org/z/sYjptQ
define internal i32 @_ZN4core3num10NonZeroU323get17h3d02812f86f2e907E(i32) unnamed_addr #0 !dbg !4 {
%self = alloca i32, align 4
store i32 %0, i32* %self, align 4
call void @llvm.dbg.declare(metadata i32* %self, metadata !19, metadata !DIExpression()), !dbg !21
%1 = load i32, i32* %self, align 4, !dbg !22
ret i32 %1, !dbg !23
}
define zeroext i1 @_ZN7example3foo17h7a9a6aa0879e3714E(i32) unnamed_addr #1 !dbg !24 {
%x = alloca i32, align 4
store i32 %0, i32* %x, align 4
call void @llvm.dbg.declare(metadata i32* %x, metadata !29, metadata !DIExpression()), !dbg !30
%1 = load i32, i32* %x, align 4, !dbg !31, !range !32
%2 = call i32 @_ZN4core3num10NonZeroU323get17h3d02812f86f2e907E(i32 %1), !dbg !31
br label %bb1, !dbg !31
bb1: ; preds = %start
%3 = icmp ne i32 %2, 0, !dbg !31
ret i1 %3, !dbg !33
}
[...]
!32 = !{i32 1, i32 0}
optimized:
define zeroext i1 @_ZN7example3foo17h7a9a6aa0879e3714E(i32) unnamed_addr #0 !dbg !4 {
call void @llvm.dbg.value(metadata i32 %0, metadata !21, metadata !DIExpression()), !dbg !22
%1 = icmp ne i32 %0, 0, !dbg !23
ret i1 %1, !dbg !24
}
Copying another example of an unnecessary check from https://github.com/rust-lang/rust/issues/54868:
use std::num::NonZeroU64;
pub fn foo(x: u64, y: NonZeroU64) -> u64 {
x / y.get()
}
https://play.rust-lang.org/?version=nightly&mode=release&gist=432cdd4e395ad81d031858ce7b5e3980
Would it be okay to special-case NonZero*::get()
in the compiler to add the range metadata to the call?
Maybe we can start adding range metadata on calls and change NonZero::get
to do id(self).0
? Will LLVM preserve that information?
The id(self)
part will have range metadata but it won't be visible to callers of NonZero::get
-- at least until after inlining, but inlining should/will remove the id
call too.
@rkruppe I was hoping inlining would put the range metadata on the calls of NonZero::get
, but I guess there's no place to keep it around in the meanwhile.
Why can't LLVM just come up with a design for "value restrictions" (e.g. range
, nonnull
, align
, dereferenceable
) that are reused between load
s, call
s and function signature argument/returns?
Is it really that hard, compared to the mess it has now?
The three possibilities I see are:
NonZeroN::get()
calls on our side to also include range metadata. This will also get lost after inlining, but it will be usable for early optimization.@eddyb
I was hoping inlining would put the range metadata on the calls of NonZero::get, but I guess there's no place to keep it around in the meanwhile.
The NonZero::get calls should/will also get inlined, so after inlining runs there are no calls left to put metadata on.
Why can't LLVM just come up with a design for "value restrictions" (e.g. range, nonnull, align, dereferenceable) that are reused between loads, calls and function signature argument/returns?
Metadata and attributes mostly cover all of these locations (AFAIK the only combination missing is metadata on parameters and returns), what's really lacking is keeping this information around when the calls and loads that have these annotations are eliminated (by SROA and inlining). Why that is difficult is a big topic and I'm not even sure what factors dominate, but for example we both know the downsides of putting lots of explicit instructions asserting these fact into the instruction stream (e.g. via calls to llvm.assume
).
Metadata and attributes mostly cover all of these locations
IMO a lot of the difficulty in getting everything working is the lack of an unified system and representation which means you have to handle 2-3 different systems manually.
I don't really follow. The work of emitting all the relevant attributes and metadata that exist today has already been done and we still have this issue, and when metadata on parameters are added, we'll still have serious issues (e.g., when a NonZero is constructed locally -- optimizing checks there requires flow-sensitive analysis which LLVM isn't always very good at, and frankly it's a hard problem).
You may want to experiment using the llvm.assume intrinsic: https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic
Sorry if i'm being too naive, but isn't adding a core::intrinsics::assume()
call within the get() function implementation enough to get it right?
assume
tends to slow down LLVM a lot and even (ironically) block optimizations. I wonder if that got better recently.
unreachable_unchecked
should also work. Testing it with my own MyNonZeroU32
removed the unnecessary check for x.get() != 0
example, but comparing two Option<MyNonZeroU32>
still generates inefficient and weird assembly: https://godbolt.org/z/3wrGFz
With some experimentation I found that currently one can work around this by introducing an indirection—consistent with the scalar bounds being used when actually performing a load:
pub fn non_zero(mut x: NonZeroU32) -> bool {
non_zero_ind(&x)
}
fn non_zero_ind(x: &NonZeroU32) -> bool {
(*x).get() != 0
}
playground::non_zero:
mov al, 1
ret
Somewhat confusingly this seems to depend on the order of passes. I also tried using a similar hack for removing the branch from leading_zeros
and it only works when the helper method is public.
pub fn non_zero(mut x: NonZeroU32) -> u32 {
non_zero_ind(&x)
}
pub fn non_zero_ind(x: &NonZeroU32) -> u32 {
(*x).get().leading_zeros()
}
playground::non_zero:
bsr eax, edi
xor eax, 31
ret
playground::non_zero_ind:
bsr eax, dword ptr [rdi]
xor eax, 31
ret
Made a LLVM issue for adding niche info to function parameters. https://github.com/llvm/llvm-project/issues/76628
Made a LLVM issue for adding niche info to function parameters. llvm/llvm-project#76628
Looks like this was done in https://github.com/llvm/llvm-project/pull/83171
Made a LLVM issue for adding niche info to function parameters. llvm/llvm-project#76628
Looks like this was done in llvm/llvm-project#83171
And now it's being reverted due to a memory leak 😅
...but it's back in https://github.com/llvm/llvm-project/pull/84617
Repro: https://play.rust-lang.org/?gist=0981601dda9c5800e353e31b22682bb5&version=nightly&mode=release
Actual:
Expected:
cc https://github.com/rust-lang/rust/issues/49137