rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
98.84k stars 12.77k forks source link

Different codegen between literal and constant on `bitwise-copy` struct #128168

Closed CrazyboyQCD closed 4 months ago

CrazyboyQCD commented 4 months ago

Compiler generates different code between literal and constant for bitwise-copy struct with size larger than a quadword created from non zero memory.

From zero memory(Same): Godbolt link.

From non zero memory(Different): Godbolt link.

From non zero memory and with one more field (Different and more mov on literal): Godbolt link.

tgross35 commented 4 months ago

Relevant bit of code:

#[derive(Clone, Copy)]
#[repr(C)]
pub struct A { v1: u8, v2: u8, v3: u8, v4: u8, v5: u8, v6: u8, v7: u8, v8: u8, v9: u8, v10: u8 }

#[no_mangle]
pub const fn new_literal() -> A {
    A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 }
}

#[no_mangle]
pub const fn new_const() -> A {
    const T: A = A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 };
    T
}
@0 = private unnamed_addr constant <{ [10 x i8] }> <{ [10 x i8] c"\01\00\00\00\00\00\00\00\00\00" }>, align 1

define void @new_literal(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
  store i8 1, ptr %_0, align 1
  %0 = getelementptr inbounds i8, ptr %_0, i64 1
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(9) %0, i8 0, i64 9, i1 false)
  ret void
}

define void @new_const(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) %_0, ptr noundef nonnull align 1 dereferenceable(10) @0, i64 10, i1 false)
  ret void
}

We can see exactly what is happening here: @new_literal is constructing the item in place by storing a 1 at the return value's base address (%_0), and then memsetting the rest to zero (the gep gets a pointer named %0 within the return type at offset 1, then the memset is called at the address).

new_const is just doing a memcpy from a static (@0 to the return value (%_0).

So new_const did the calculation in advance, new_literal is doing it on the fly. This is expected; marking a function const does not mean it is always evaluated at compile time if possible, it just means that it can be evaluated at compile time. This might be feasible to some degree, but isn't done because trying to evaluate everything that could be const (a lot) would slow compile times down a lot.

If you want to ensure something is evaluated at compile time, assigning it to a const or static is the correct way to do it. Or since the past ~1 Rust versions, you can use const blocks const { /* calculations */ }.

CrazyboyQCD commented 4 months ago

Relevant bit of code:

#[derive(Clone, Copy)]
#[repr(C)]
pub struct A { v1: u8, v2: u8, v3: u8, v4: u8, v5: u8, v6: u8, v7: u8, v8: u8, v9: u8, v10: u8 }

#[no_mangle]
pub const fn new_literal() -> A {
    A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 }
}

#[no_mangle]
pub const fn new_const() -> A {
    const T: A = A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 };
    T
}
@0 = private unnamed_addr constant <{ [10 x i8] }> <{ [10 x i8] c"\01\00\00\00\00\00\00\00\00\00" }>, align 1

define void @new_literal(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
  store i8 1, ptr %_0, align 1
  %0 = getelementptr inbounds i8, ptr %_0, i64 1
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(9) %0, i8 0, i64 9, i1 false)
  ret void
}

define void @new_const(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
  tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) %_0, ptr noundef nonnull align 1 dereferenceable(10) @0, i64 10, i1 false)
  ret void
}

We can see exactly what is happening here: @new_literal is constructing the item in place by storing a 1 at the return value's base address (%_0), and then memsetting the rest to zero (the gep gets a pointer named %0 within the return type at offset 1, then the memset is called at the address).

new_const is just doing a memcpy from a static (@0 to the return value (%_0).

So new_const did the calculation in advance, new_literal is doing it on the fly. This is expected; marking a function const does not mean it is always evaluated at compile time if possible, it just means that it can be evaluated at compile time. This might be feasible to some degree, but isn't done because trying to evaluate everything that could be const (a lot) would slow compile times down a lot.

If you want to ensure something is evaluated at compile time, assigning it to a const or static is the correct way to do it. Or since the past ~1 Rust versions, you can use const blocks const { /* calculations */ }.

Good to know that, and I think this should be documented somewhere since many of users assume them behave the same.

tgross35 commented 4 months ago

I agree because I had to learn that recently too. If you have any ideas where the documentation could be improved here, PRs to the reference would be great.

tgross35 commented 4 months ago

I'm going to close this since I don't think there is anything unexpected here, but please feel free to follow up with documentation improvements if you have any suggestions.