rust-lang / rust

Empowering everyone to build reliable and efficient software.
https://www.rust-lang.org
Other
96.81k stars 12.5k forks source link

missed(?) optimization with a const array of same item #107208

Open iximeow opened 1 year ago

iximeow commented 1 year ago

some example code:

const LUT: [u8; 2] = [
    1,
    1,
];

pub fn decode(i: u8) -> u8 {
    if i < 2 {
        LUT[i as usize]
    } else {
        2
    }
}

compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21):

example::decode:
        mov     al, 2
        cmp     dil, 1
        ja      .LBB0_2
        movzx   eax, dil
        lea     rcx, [rip + .L__unnamed_1]
        movzx   eax, byte ptr [rax + rcx]
.LBB0_2:
        ret

.L__unnamed_1:
        .zero   2,1

rustc faithfully produces an array of 0x01, 0x01 in the resulting program and loads from it. it seems to me rustc ought to be able to tell that all entries in the array are identical, use that value, and avoid loading from the array in the first place.

i'm not entirely sure if this should be an issue against rust-lang/rust or bug against LLVM, but i tried looking through I-slow for similar reports. so i think starting here is the right call :D thanks!

edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.

i happened to notice this with LUT being an array of function pointers, included for completeness ```rust const LUT: [fn() -> u8; 2] = [ || { 1 }, || { 1 }, ]; pub fn decode(i: u8) -> u8 { if i < 2 { LUT[i as usize]() } else { 2 } } ``` compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21): ``` core::ops::function::FnOnce::call_once: mov al, 1 ret playground::decode: cmp dil, 1 ja .LBB1_1 movzx eax, dil lea rcx, [rip + .L__unnamed_1] jmp qword ptr [rcx + 8*rax] .LBB1_1: mov al, 2 ret .L__unnamed_1: .quad core::ops::function::FnOnce::call_once .quad core::ops::function::FnOnce::call_once ``` similarly, the array is indexed regardless of the fact that the array's items are all identical. this, in turn, is a minimized form of the original code with dozens of entries in the array. in that case, `LUT` is an associated item on a trait where one implementation happens to result in all function pointers being the same nearly-no-op function.
alex commented 1 year ago

Minimal C reproducer produces identical code: https://godbolt.org/z/E97jeGshr

erikdesjardins commented 1 year ago

i'm not entirely sure if this should be an issue against rust-lang/rust or bug against LLVM, but i tried looking through I-slow for similar reports. so i think starting here is the right call :D thanks!

Indeed, even if it's an LLVM bug, we often keep issues open in this repo to track the Rust reproducer and make sure the upstream change actually fixes it.

edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.

If you mean something like https://godbolt.org/z/hn8WMd7Gx, the optimization is simpler in that case because i < 1 means the load is always from a constant index (LUT[0]).

It's harder when the load is from a variable index. LLVM should already handle the case where all bits are the same (though this wouldn't help your reproducer) in ConstantFoldLoadFromUniformValue. Strangely, this doesn't seem to work with an all-zero array: https://godbolt.org/z/hT8Gv6Gof.

iximeow commented 1 year ago

If you mean something like https://godbolt.org/z/hn8WMd7Gx,

yep, that's the kind of thing i was thinking.

means the load is always from a constant index (LUT[0]).

ah! that makes much more sense; i hadn't thought about the index being a constant in that case.

iximeow commented 1 year ago

i've just realized:

Indeed, even if it's an LLVM bug,

particularly given the C reproducer (thanks alex!), i should go open an issue against llvm/llvm-project about it then, and link it here for tracking purposes yeah?

workingjubilee commented 1 year ago

Yes please! Someone will likely get around to it Eventually but, y'know,

khei4 commented 1 year ago

Upstream patch: https://reviews.llvm.org/D144445

nikic commented 1 year ago

(Assigning to myself to check back after the LLVM 17 upgrade).

nikic commented 1 year ago

This works now if LUT is a static: https://godbolt.org/z/Efxcn9K5s

But not if it's a const: https://godbolt.org/z/rf7eKdfGd

This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases. As a side effect of that change, const arrays now get materialized as explicit vector stores at every use, instead of referencing a global. That seems potentially pretty bad.

erikdesjardins commented 1 month ago

For the case with function pointers:

...when the function pointers are all the same declared function, this was fixed in 1.73: https://godbolt.org/z/WMefYsah4

...when the function pointers are from closures, it still isn't resolved: https://godbolt.org/z/5baEW4ro8 (it seems like this is because the two closure functions get merged too late)

scottmcm commented 1 month ago

This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases.

This was completely undone in #123185 for 1.79, so it should no longer be a vector op issue, at least.