rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.88k stars 12.52k forks source link

RISC-V Codegen Problem with type information #114508

Open coastalwhite opened 1 year ago

coastalwhite commented 1 year ago

Codegen for riscv64gc-unknown-linux-gnu sends wrong type information to LLVM.

When I compile on godbolt.org with the following flags.

GodBolt Link,l:'5',n:'0',o:'Rust+source+%231',t:'0')),k:49.165541155755804,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((h:compiler,i:(compiler:r1710,deviceViewOpen:'1',filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'1',intel:'0',libraryCode:'0',trim:'1'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:1,lang:rust,libs:!(),options:'--target%3Driscv64gc-unknown-linux-gnu+-C+target-feature%3D%2Bzbb+-C+opt-level%3D3',overrides:!(),selection:(endColumn:12,endLineNumber:5,positionColumn:1,positionLineNumber:1,selectionStartColumn:12,selectionStartLineNumber:5,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+rustc+1.71.0+(Editor+%231)',t:'0')),k:50.834458844244224,l:'4',n:'0',o:'',s:0,t:'0')),l:'2',n:'0',o:'',t:'0')),version:4)

--target=riscv64gc-unknown-linux-gnu -C target-feature=+zbb -C opt-level=3

I tried this code:

fn max(a: u32, b: u32) -> u32 {
    u32::max(a, b)
}

It outputs:

example::max:
        sext.w  a1, a1
        sext.w  a0, a0
        maxu    a0, a0, a1
        ret

It definitely does not need to sign extend a0 or a1 and in fact, when we inspect the LLVM IR, it seems to assume that the numbers are i32s instead of u32s. Is this a problem with Rust's codegen output?

LLVM IR:

; example::max
define noundef i32 @example::max(i32 noundef %a, i32 noundef %b) unnamed_addr personality ptr @rust_eh_personality {
start:
  %.0.sroa.speculated.i = tail call i32 @llvm.umax.i32(i32 %a, i32 %b)
  ret i32 %.0.sroa.speculated.i
}

declare noundef signext i32 @rust_eh_personality(i32 noundef signext, i32 noundef signext, i64 noundef, ptr noundef, ptr noundef) unnamed_addr #1

declare i32 @llvm.umax.i32(i32, i32) #2

Properly optimized code would be:

example::max:
        maxu    a0, a0, a1
        ret
Urgau commented 1 year ago

when we inspect the LLVM IR, it seems to assume that the numbers are i32s instead of u32s. Is this a problem with Rust's codegen output?

LLVM integers types do not differentiate between signed or unsigned integers. It's the operation (ie the intrinsic) that makes the differentiation and llvm.umax.* is the unsigned max variant:

Return the smaller of %a and %b comparing the values as unsigned integers

It's seems to me that it's a problem with the RISC-V LLVM backend.

@rustbot labels: +A-LLVM +A-codegen +O-riscv +T-compiler -needs-triage

rakicaleksandar1999 commented 1 month ago

According to the documentation, for RV64, the registers are 64 bits wide. The arguments of the @llvm.umax.i32 intrinsic are 32 bits wide, and the higher 32 bits of the registers a0 and a1 are practically unknown. The maxu instruction probably uses all of the 64 bits of the registers, so it is necessary to ensure that the higher 32 bits of the 64 bits wide registers have appropriate values, I would say.

nikic commented 1 month ago

The problem here is that we don't set signext/zeroext attributes on the arguments with the Rust ABI. signext is set when using extern "C" (as it is an ABI requirement there). It probably makes sense to set them for the Rust ABI as well, for this target.