The NonZero types don't tell LLVM that they're non-zero on get

scottmcm commented 6 years ago

Repro: https://play.rust-lang.org/?gist=0981601dda9c5800e353e31b22682bb5&version=nightly&mode=release

pub fn foo(x: std::num::NonZeroU32) -> bool {
    x.get() != 0
}

Actual:

  %0 = icmp ne i32 %x, 0
  ret i1 %0

Expected:

  ret i1 true

cc https://github.com/rust-lang/rust/issues/49137

hanna-kruppe commented 6 years ago

In writing #50156 I found out that LLVM supports range metadata on call instructions. So you can have %x.2 = call i32 @NonZeroU32_get(i32 %x.1), !range !33 and it will use the range metadata to reason about the return value. If I add the range metadata manually to the IR for the above test case, the optimization indeed fires. So we can probably fix some of this (maybe it falls apart in more complex cases) by emitting range metadata for return values, which we currently don't. cc @eddyb

hanna-kruppe commented 6 years ago

Filed #50157 for adding metadata to calls, but this issue is more general -- once the get() gets inlined (or if it never gets called to begin with, e.g., because the u32 is accessed by type punning), we'd lose the information from the metadata on the call. The code in the OP is optimized because instcombine turns the compare into true before inlining happens, but in more complex cases #50156 might also help (though that only helps if the NonZero* is passed in as argument, it doesn't help with locals or if foo is inlined).

hanna-kruppe commented 6 years ago

Scratch that, range metadata on return values doesn't help with this issue since get returns a plain old u32. (thanks @nox)

cuviper commented 6 years ago

With debuginfo, we emit an alloca for %x that does have range metadata when loaded from. But it appears that is still not sufficiently propagated through optimization. https://rust.godbolt.org/z/sYjptQ

define internal i32 @_ZN4core3num10NonZeroU323get17h3d02812f86f2e907E(i32) unnamed_addr #0 !dbg !4 {
  %self = alloca i32, align 4
  store i32 %0, i32* %self, align 4
  call void @llvm.dbg.declare(metadata i32* %self, metadata !19, metadata !DIExpression()), !dbg !21
  %1 = load i32, i32* %self, align 4, !dbg !22
  ret i32 %1, !dbg !23
}

define zeroext i1 @_ZN7example3foo17h7a9a6aa0879e3714E(i32) unnamed_addr #1 !dbg !24 {
  %x = alloca i32, align 4
  store i32 %0, i32* %x, align 4
  call void @llvm.dbg.declare(metadata i32* %x, metadata !29, metadata !DIExpression()), !dbg !30
  %1 = load i32, i32* %x, align 4, !dbg !31, !range !32
  %2 = call i32 @_ZN4core3num10NonZeroU323get17h3d02812f86f2e907E(i32 %1), !dbg !31
  br label %bb1, !dbg !31

bb1:                                              ; preds = %start
  %3 = icmp ne i32 %2, 0, !dbg !31
  ret i1 %3, !dbg !33
}

[...]
!32 = !{i32 1, i32 0}

optimized:

define zeroext i1 @_ZN7example3foo17h7a9a6aa0879e3714E(i32) unnamed_addr #0 !dbg !4 {
  call void @llvm.dbg.value(metadata i32 %0, metadata !21, metadata !DIExpression()), !dbg !22
  %1 = icmp ne i32 %0, 0, !dbg !23
  ret i1 %1, !dbg !24
}

scottmcm commented 5 years ago

Copying another example of an unnecessary check from https://github.com/rust-lang/rust/issues/54868:

use std::num::NonZeroU64;
pub fn foo(x: u64, y: NonZeroU64) -> u64 {
    x / y.get()
}

https://play.rust-lang.org/?version=nightly&mode=release&gist=432cdd4e395ad81d031858ce7b5e3980

nikic commented 5 years ago

Would it be okay to special-case NonZero*::get() in the compiler to add the range metadata to the call?

eddyb commented 5 years ago

Maybe we can start adding range metadata on calls and change NonZero::get to do id(self).0? Will LLVM preserve that information?

hanna-kruppe commented 5 years ago

The id(self) part will have range metadata but it won't be visible to callers of NonZero::get -- at least until after inlining, but inlining should/will remove the id call too.

eddyb commented 5 years ago

@rkruppe I was hoping inlining would put the range metadata on the calls of NonZero::get, but I guess there's no place to keep it around in the meanwhile.

Why can't LLVM just come up with a design for "value restrictions" (e.g. range, nonnull, align, dereferenceable) that are reused between loads, calls and function signature argument/returns?

Is it really that hard, compared to the mess it has now?

nikic commented 5 years ago

The three possibilities I see are:

Add support to LLVM to propagate range metadata from a return value to callsites. I think this will not be particularly fruitful, because it would likely happen too late anyway. E.g. current func attr inference passes run post-inlining.
Add support for metadata on function parameters, as suggested in #50156. This will at least cover the cases discussed here, where the NonZeroN is passed as an argument.
Special-case NonZeroN::get() calls on our side to also include range metadata. This will also get lost after inlining, but it will be usable for early optimization.

hanna-kruppe commented 5 years ago

@eddyb

I was hoping inlining would put the range metadata on the calls of NonZero::get, but I guess there's no place to keep it around in the meanwhile.

The NonZero::get calls should/will also get inlined, so after inlining runs there are no calls left to put metadata on.

Why can't LLVM just come up with a design for "value restrictions" (e.g. range, nonnull, align, dereferenceable) that are reused between loads, calls and function signature argument/returns?

Metadata and attributes mostly cover all of these locations (AFAIK the only combination missing is metadata on parameters and returns), what's really lacking is keeping this information around when the calls and loads that have these annotations are eliminated (by SROA and inlining). Why that is difficult is a big topic and I'm not even sure what factors dominate, but for example we both know the downsides of putting lots of explicit instructions asserting these fact into the instruction stream (e.g. via calls to llvm.assume).

eddyb commented 5 years ago

Metadata and attributes mostly cover all of these locations

IMO a lot of the difficulty in getting everything working is the lack of an unified system and representation which means you have to handle 2-3 different systems manually.

hanna-kruppe commented 5 years ago

I don't really follow. The work of emitting all the relevant attributes and metadata that exist today has already been done and we still have this issue, and when metadata on parameters are added, we'll still have serious issues (e.g., when a NonZero is constructed locally -- optimizing checks there requires flow-sensitive analysis which LLVM isn't always very good at, and frankly it's a hard problem).

tstellar commented 5 years ago

You may want to experiment using the llvm.assume intrinsic: https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic

crlf0710 commented 4 years ago

Sorry if i'm being too naive, but isn't adding a core::intrinsics::assume() call within the get() function implementation enough to get it right?

eddyb commented 4 years ago

assume tends to slow down LLVM a lot and even (ironically) block optimizations. I wonder if that got better recently.

MSxDOS commented 4 years ago

unreachable_unchecked should also work. Testing it with my own MyNonZeroU32 removed the unnecessary check for x.get() != 0 example, but comparing two Option<MyNonZeroU32> still generates inefficient and weird assembly: https://godbolt.org/z/3wrGFz

HeroicKatora commented 3 years ago

With some experimentation I found that currently one can work around this by introducing an indirection—consistent with the scalar bounds being used when actually performing a load:

pub fn non_zero(mut x: NonZeroU32) -> bool {
    non_zero_ind(&x)
}

fn non_zero_ind(x: &NonZeroU32) -> bool {
    (*x).get() != 0
}

playground::non_zero:
    mov al, 1
    ret

Somewhat confusingly this seems to depend on the order of passes. I also tried using a similar hack for removing the branch from leading_zeros and it only works when the helper method is public.

pub fn non_zero(mut x: NonZeroU32) -> u32 {
    non_zero_ind(&x)
}

pub fn non_zero_ind(x: &NonZeroU32) -> u32 {
    (*x).get().leading_zeros()
}

playground::non_zero:
    bsr eax, edi
    xor eax, 31
    ret

playground::non_zero_ind:
    bsr eax, dword ptr [rdi]
    xor eax, 31
    ret

AngelicosPhosphoros commented 11 months ago

Made a LLVM issue for adding niche info to function parameters. https://github.com/llvm/llvm-project/issues/76628

mvelbaum commented 8 months ago

Made a LLVM issue for adding niche info to function parameters. llvm/llvm-project#76628

Looks like this was done in https://github.com/llvm/llvm-project/pull/83171

mickvangelderen commented 8 months ago

Made a LLVM issue for adding niche info to function parameters. llvm/llvm-project#76628

Looks like this was done in llvm/llvm-project#83171

And now it's being reverted due to a memory leak 😅

scottmcm commented 8 months ago

...but it's back in https://github.com/llvm/llvm-project/pull/84617

rust-lang / rust

The NonZero types don't tell LLVM that they're non-zero on get #49572