Closed CrazyboyQCD closed 4 months ago
Relevant bit of code:
#[derive(Clone, Copy)]
#[repr(C)]
pub struct A { v1: u8, v2: u8, v3: u8, v4: u8, v5: u8, v6: u8, v7: u8, v8: u8, v9: u8, v10: u8 }
#[no_mangle]
pub const fn new_literal() -> A {
A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 }
}
#[no_mangle]
pub const fn new_const() -> A {
const T: A = A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 };
T
}
@0 = private unnamed_addr constant <{ [10 x i8] }> <{ [10 x i8] c"\01\00\00\00\00\00\00\00\00\00" }>, align 1
define void @new_literal(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
store i8 1, ptr %_0, align 1
%0 = getelementptr inbounds i8, ptr %_0, i64 1
tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(9) %0, i8 0, i64 9, i1 false)
ret void
}
define void @new_const(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr {
start:
tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) %_0, ptr noundef nonnull align 1 dereferenceable(10) @0, i64 10, i1 false)
ret void
}
We can see exactly what is happening here: @new_literal
is constructing the item in place by storing a 1
at the return value's base address (%_0
), and then memsetting the rest to zero (the gep gets a pointer named %0
within the return type at offset 1, then the memset is called at the address).
new_const
is just doing a memcpy from a static (@0
to the return value (%_0
).
So new_const
did the calculation in advance, new_literal
is doing it on the fly. This is expected; marking a function const
does not mean it is always evaluated at compile time if possible, it just means that it can be evaluated at compile time. This might be feasible to some degree, but isn't done because trying to evaluate everything that could be const
(a lot) would slow compile times down a lot.
If you want to ensure something is evaluated at compile time, assigning it to a const
or static
is the correct way to do it. Or since the past ~1 Rust versions, you can use const blocks const { /* calculations */ }
.
Relevant bit of code:
#[derive(Clone, Copy)] #[repr(C)] pub struct A { v1: u8, v2: u8, v3: u8, v4: u8, v5: u8, v6: u8, v7: u8, v8: u8, v9: u8, v10: u8 } #[no_mangle] pub const fn new_literal() -> A { A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 } } #[no_mangle] pub const fn new_const() -> A { const T: A = A { v1: 1, v2: 0, v3: 0, v4: 0, v5: 0, v6: 0, v7: 0, v8: 0, v9: 0, v10: 0 }; T }
@0 = private unnamed_addr constant <{ [10 x i8] }> <{ [10 x i8] c"\01\00\00\00\00\00\00\00\00\00" }>, align 1 define void @new_literal(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr { start: store i8 1, ptr %_0, align 1 %0 = getelementptr inbounds i8, ptr %_0, i64 1 tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(9) %0, i8 0, i64 9, i1 false) ret void } define void @new_const(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([10 x i8]) align 1 dereferenceable(10) %_0) unnamed_addr { start: tail call void @llvm.memcpy.p0.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) %_0, ptr noundef nonnull align 1 dereferenceable(10) @0, i64 10, i1 false) ret void }
We can see exactly what is happening here:
@new_literal
is constructing the item in place by storing a1
at the return value's base address (%_0
), and then memsetting the rest to zero (the gep gets a pointer named%0
within the return type at offset 1, then the memset is called at the address).
new_const
is just doing a memcpy from a static (@0
to the return value (%_0
).So
new_const
did the calculation in advance,new_literal
is doing it on the fly. This is expected; marking a functionconst
does not mean it is always evaluated at compile time if possible, it just means that it can be evaluated at compile time. This might be feasible to some degree, but isn't done because trying to evaluate everything that could beconst
(a lot) would slow compile times down a lot.If you want to ensure something is evaluated at compile time, assigning it to a
const
orstatic
is the correct way to do it. Or since the past ~1 Rust versions, you can use const blocksconst { /* calculations */ }
.
Good to know that, and I think this should be documented somewhere since many of users assume them behave the same.
I agree because I had to learn that recently too. If you have any ideas where the documentation could be improved here, PRs to the reference would be great.
I'm going to close this since I don't think there is anything unexpected here, but please feel free to follow up with documentation improvements if you have any suggestions.
Compiler generates different code between literal and constant for
bitwise-copy
struct with size larger than a quadword created from non zero memory.From zero memory(Same): Godbolt link.
From non zero memory(Different): Godbolt link.
From non zero memory and with one more field (Different and more
mov
on literal): Godbolt link.