rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.79k stars 12.5k forks source link

Non-literal constant objects are not well optimized comparing to literal constant objects #118557

Open EFanZh opened 9 months ago

EFanZh commented 9 months ago

The problem is originated from https://github.com/rust-lang/log/pull/599.

Sometimes, multiple function calls can have the same constant arguments:

pub fn test(f: fn(&[u32; 10])) {
    f(&[7; 10]);
    f(&[7; 10]);
    f(&[7; 10]);
    f(&[7; 10]);
}

Rust recognizes that these arguments are the same value, so it would create a single constant value and pass it to each function:

example::test:
        push    r14
        push    rbx
        push    rax
        mov     r14, rdi
        lea     rbx, [rip + .L__unnamed_1]
        mov     rdi, rbx
        call    r14
        mov     rdi, rbx
        call    r14
        mov     rdi, rbx
        call    r14
        mov     rdi, rbx
        mov     rax, r14
        add     rsp, 8
        pop     rbx
        pop     r14
        jmp     rax

.L__unnamed_1:
        .asciz  "\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000\000\007\000\000"

But sometimes, these constant arguments have to be computed by some additional functions:

pub fn test(f: fn(&[u32; 10])) {
    f(&[std::convert::identity(7); 10]);
    f(&[std::convert::identity(7); 10]);
    f(&[std::convert::identity(7); 10]);
    f(&[std::convert::identity(7); 10]);
}

Then the compiler can’t optimize these constant objects well as the first example, additional copying operations are generated:

.LCPI0_0:
        .long   7
        .long   7
        .long   7
        .long   7
example::test:
        push    r14
        push    rbx
        sub     rsp, 40
        mov     rbx, rdi
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmmword ptr [rsp + 16], xmm0
        movabs  r14, 30064771079
        mov     qword ptr [rsp + 32], r14
        mov     rdi, rsp
        call    rbx
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmmword ptr [rsp + 16], xmm0
        mov     qword ptr [rsp + 32], r14
        mov     rdi, rsp
        call    rbx
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmmword ptr [rsp + 16], xmm0
        mov     qword ptr [rsp + 32], r14
        mov     rdi, rsp
        call    rbx
        movaps  xmm0, xmmword ptr [rip + .LCPI0_0]
        movaps  xmmword ptr [rsp], xmm0
        movaps  xmmword ptr [rsp + 16], xmm0
        mov     qword ptr [rsp + 32], r14
        mov     rdi, rsp
        call    rbx
        add     rsp, 40
        pop     rbx
        pop     r14
        ret

You can see the comparison here: https://godbolt.org/z/frj9a8TG6.

Additionally, using a const value as a proxy helps:

pub fn test(f: fn(&[u32; 10])) {
    const SEVEN: u32 = std::convert::identity(7);

    f(&[SEVEN; 10]);
    f(&[SEVEN; 10]);
    f(&[SEVEN; 10]);
    f(&[SEVEN; 10]);
}

But some functions can’t be used to compute a const value, such as std::panic::Location::caller, so the method above does not always work.

the8472 commented 9 months ago

Array repeat expressions are not guaranteed to be consts, e.g. this is valid:

use atomic::Ordering;
use core::sync::atomic::AtomicU32;
static CNT: AtomicU32 = AtomicU32::new(0);

fn foo() -> u32 {
    CNT.fetch_add(1, Ordering::Relaxed)
}

fn main() {
    let _a = &[foo(); 32];
}

You can either use a separate const as in your 2nd example or (on nightly) you can use inline consts

#![feature(inline_const)]

pub fn test(f: fn(&[u32; 10])) {
    f(&[const { std::convert::identity(7) }; 10]);
    f(&[const { std::convert::identity(7) }; 10]);
    f(&[const { std::convert::identity(7) }; 10]);
    f(&[const { std::convert::identity(7) }; 10]);
}
EFanZh commented 9 months ago

Array repeat expressions are not guaranteed to be consts

If a value can’t be determined at compile time, it is understandable that the compiler can’t do the optimization. The problem is that there might be an optimization opportunity if the value can indeed be determined at compile time, which the compiler failed to utilize.

you can use inline consts

Even if inline consts was stabilized, there are expressions that can’t be enclosed in const blocks, but still generates compile time const values (like std::panic::Location::caller).