Open iximeow opened 1 year ago
Minimal C reproducer produces identical code: https://godbolt.org/z/E97jeGshr
i'm not entirely sure if this should be an issue against rust-lang/rust or bug against LLVM, but i tried looking through I-slow for similar reports. so i think starting here is the right call :D thanks!
Indeed, even if it's an LLVM bug, we often keep issues open in this repo to track the Rust reproducer and make sure the upstream change actually fixes it.
edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.
If you mean something like https://godbolt.org/z/hn8WMd7Gx, the optimization is simpler in that case because i < 1
means the load is always from a constant index (LUT[0]
).
It's harder when the load is from a variable index. LLVM should already handle the case where all bits are the same (though this wouldn't help your reproducer) in ConstantFoldLoadFromUniformValue
. Strangely, this doesn't seem to work with an all-zero array: https://godbolt.org/z/hT8Gv6Gof.
If you mean something like https://godbolt.org/z/hn8WMd7Gx,
yep, that's the kind of thing i was thinking.
means the load is always from a constant index (
LUT[0]
).
ah! that makes much more sense; i hadn't thought about the index being a constant in that case.
i've just realized:
Indeed, even if it's an LLVM bug,
particularly given the C reproducer (thanks alex!), i should go open an issue against llvm/llvm-project
about it then, and link it here for tracking purposes yeah?
Yes please! Someone will likely get around to it Eventually but, y'know,
Upstream patch: https://reviews.llvm.org/D144445
(Assigning to myself to check back after the LLVM 17 upgrade).
This works now if LUT
is a static
: https://godbolt.org/z/Efxcn9K5s
But not if it's a const
: https://godbolt.org/z/rf7eKdfGd
This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases. As a side effect of that change, const arrays now get materialized as explicit vector stores at every use, instead of referencing a global. That seems potentially pretty bad.
For the case with function pointers:
...when the function pointers are all the same declared function, this was fixed in 1.73: https://godbolt.org/z/WMefYsah4
...when the function pointers are from closures, it still isn't resolved: https://godbolt.org/z/5baEW4ro8 (it seems like this is because the two closure functions get merged too late)
This appears to be a regression from @scottmcm's changes to replace memcpy with vector load+store in some cases.
This was completely undone in #123185 for 1.79, so it should no longer be a vector op issue, at least.
some example code:
compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21):
rustc faithfully produces an array of
0x01, 0x01
in the resulting program and loads from it. it seems to me rustc ought to be able to tell that all entries in the array are identical, use that value, and avoid loading from the array in the first place.i'm not entirely sure if this should be an issue against
rust-lang/rust
or bug against LLVM, but i tried looking throughI-slow
for similar reports. so i think starting here is the right call :D thanks!edit: it does seem remarkable that the load is eliminated if there is only one item in the array. that looks to be handled somewhere before generating LLVM IR, so i think this is the right place.
i happened to notice this with LUT being an array of function pointers, included for completeness
```rust const LUT: [fn() -> u8; 2] = [ || { 1 }, || { 1 }, ]; pub fn decode(i: u8) -> u8 { if i < 2 { LUT[i as usize]() } else { 2 } } ``` compiles to (via 1.66.1, or 1.68.0-nightly from 2023-01-21): ``` core::ops::function::FnOnce::call_once: mov al, 1 ret playground::decode: cmp dil, 1 ja .LBB1_1 movzx eax, dil lea rcx, [rip + .L__unnamed_1] jmp qword ptr [rcx + 8*rax] .LBB1_1: mov al, 2 ret .L__unnamed_1: .quad core::ops::function::FnOnce::call_once .quad core::ops::function::FnOnce::call_once ``` similarly, the array is indexed regardless of the fact that the array's items are all identical. this, in turn, is a minimized form of the original code with dozens of entries in the array. in that case, `LUT` is an associated item on a trait where one implementation happens to result in all function pointers being the same nearly-no-op function.