rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.81k stars 12.77k forks source link

3-way comparison is branchier after 1.71 #125338

Open mqudsi opened 6 months ago

mqudsi commented 6 months ago

Code

The following code contains two impls of the same (or what should be the same) logic. Prior to 1.71.0, the first impl (using Option::map() instead of match) would generate llvm ir that could be optimized to use cmov instructions, but the second impl (using match instead) would use jumps instead.

1.71.0+ generate the same (branchy) llvm ir for both implementations. The regression goes away if compiling with -Zinline-mir=no, but the match implementation remains branchy.

#[derive(PartialEq)]
pub enum WSL {
    Any,
    V1,
    V2,
}

#[inline(never)]
pub fn is_wsl(wsl: Option<WSL>, v: WSL) -> bool {
    wsl.map(|wsl| v == WSL::Any || wsl == v).unwrap_or(false)
}

#[inline(never)]
pub fn is_wsl2(wsl: Option<WSL>, v: WSL) -> bool {
    match (wsl, v) {
        (Some(_), WSL::Any) => true,
        (Some(x), y) => x == y,
        _ => false
    }
}

Godbolt link

Optimized IR:

define noundef zeroext i1 @_ZN7example6is_wsl17h6965abaffb382d78E(i8 noundef %wsl, i8 noundef %0) unnamed_addr #0 {
start:
  %1 = icmp ne i8 %wsl, 3
  %_3.i.i = icmp eq i8 %0, 0
  %_4.i.i = icmp eq i8 %0, %wsl
  %.0.i.i = or i1 %_3.i.i, %_4.i.i
  %.0 = and i1 %1, %.0.i.i
  ret i1 %.0
}

compiling to following x86_64:

example::is_wsl::h6965abaffb382d78:
        cmp     dil, 3
        setne   cl
        test    sil, sil
        sete    dl
        cmp     sil, dil
        sete    al
        or      al, dl
        and     al, cl
        ret

vs unoptimized ir and asm:

define noundef zeroext i1 @_ZN7example7is_wsl217h164d1336a42bfb8dE(i8 noundef %wsl, i8 noundef %v) unnamed_addr #0 {
start:
  %.not = icmp eq i8 %wsl, 3
  br i1 %.not, label %bb5, label %bb2

bb2:                                              ; preds = %start
  %0 = icmp eq i8 %v, 0
  %1 = icmp eq i8 %wsl, %v
  %spec.select = or i1 %0, %1
  br label %bb5

bb5:                                              ; preds = %bb2, %start
  %.0 = phi i1 [ false, %start ], [ %spec.select, %bb2 ]
  ret i1 %.0
}
example::is_wsl2::h164d1336a42bfb8d:
        cmp     dil, 3
        jne     .LBB1_2
        xor     eax, eax
        ret
.LBB1_2:
        test    sil, sil
        sete    cl
        cmp     dil, sil
        sete    al
        or      al, cl
        ret

Version it worked on

It most recently worked on: 1.70.0

Version with regression

rustc --version --verbose:

1.71.0-stable

@rustbot modify labels: +regression-from-stable-to-stable -regression-untriaged +A-codegen +C-bug +I-slow +T-compiler

cc https://github.com/rust-lang/rust/pull/91743

DianQK commented 6 months ago

Alive2: https://alive2.llvm.org/ce/z/VqNWdC

Related code: https://github.com/llvm/llvm-project/blob/3fa6b3bbdb0f9de18def8596d1f7fcec2ef77b5e/llvm/lib/Transforms/Utils/SimplifyCFG.cpp#L2992-L3012

There are many limitations to hoisting branch instructions. I am not sure if it is better to change these limitations.

mqudsi commented 6 months ago

It's actually a more fundamental issue than the case in the original post; even stripping down the code to just a binary enum still puts a branch in there. Godbolt,l:'5',n:'1',o:'Rust+source+%231',t:'0')),header:(),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:nightly,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:rust,libs:!(),options:'-C+opt-level%3D3+-C+debuginfo%3D0+-C+target-cpu%3Dx86-64+',overrides:!((name:edition,value:'2021')),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+nightly+(Editor+%231)',t:'0')),k:33.333333333333336,l:'4',m:100,n:'0',o:'',s:0,t:'0'),(g:!((h:ir,i:('-fno-discard-value-names':'0',compilerName:'rustc+nightly',demangle-symbols:'0',editorid:1,filter-attributes:'0',filter-comments:'0',filter-debug-info:'0',filter-instruction-metadata:'0',fontScale:14,fontUsePx:'0',j:2,selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),treeid:0),l:'5',n:'0',o:'LLVM+IR+Viewer+rustc+nightly+(Editor+%231,+Compiler+%232)',t:'0')),k:33.33333333333333,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4)

#[derive(PartialEq)]
pub enum WSL {
    Any,
    V1,
}

#[inline(never)]
pub fn is_wsl(wsl: Option<WSL>, v: WSL) -> bool {
    wsl.map(|wsl| wsl == v).unwrap_or(false)
}
example::is_wsl::h2f539bf0253b949c:
        cmp     dil, 2
        jne     .LBB0_2
        xor     eax, eax
        ret
.LBB0_2:
        test    dil, 1
        sete    al
        xor     al, sil
        ret

I'm going to leave the repro in the first post unmodified as the ideal fix would apply to that more difficult case as well, as this is just a contrived example.

apiraino commented 6 months ago

WG-prioritization assigning priority (Zulip discussion).

@rustbot label -I-prioritize +P-medium

mqudsi commented 6 months ago

While this was tagged as being fallout from #91743, it's a much later regression. bisecting the nightlies gives me these possible commits as the source of the regression:

I'm guessing it's #110705

saethlin commented 6 months ago

While this was tagged as being fallout from #91743, it's a much later regression. bisecting the nightlies gives me these possible commits as the source of the regression:

You can bisect MIR inliner issues to any number of MIR inliner heuristic tweaks depending on the details of your reproducer. The problem here is an interaction with LLVM and MIR inlining in general, and blaming any particular heuristic tweak is a mistake.

mqudsi commented 6 months ago

Thanks for the explanation, @saethlin.