Open cuviper opened 1 day ago
Part of #121571 added UB checks on indexing, and AIUI these survive to LLVM IR due to core's #![rustc_preserve_ub_checks]
, though we expect it to still be optimized away. I'm guessing that x86-64-v3
enables some transformation that disrupts the overall optimization, which may be buggy or just unfortunate.
Some of that was undone in #126299, but the changes in index_range.rs
remain. In a quick test reverting those that are definitely superfluous, I do get back to full optimization here. I'll prepare a PR for that, but there's still probably an LLVM improvement to be had too.
and AIUI these survive to LLVM IR due to core's
#![rustc_preserve_ub_checks]
They shouldn't survive. That attribute is supposed to get them past MIR optimizations, but when we lower to LLVM IR with debug assertions disabled there should be at most a zombie br bbX
.
I added the A-LLVM label because I suspect that the desirable sequence of LLVM behavior here was relying on MIR inlining. It wouldn't be the first time, and impeding the MIR inliner is the primary thing that these assertions do when disabled.
This produces that bad IR:
RUSTFLAGS="-Ctarget-cpu=x86-64-v3 --emit=llvm-ir" cargo +nightly b --release -Zbuild-std --target=x86_64-unknown-linux-gnu
And this produces the good IR (the normal inline-mir-hint-threshold
is 100):
RUSTFLAGS="-Zinline-mir-hint-threshold=1000 -Ctarget-cpu=x86-64-v3 --emit=llvm-ir" cargo +nightly b --release -Zbuild-std --target=x86_64-unknown-linux-gnu
They shouldn't survive. That attribute is supposed to get them past MIR optimizations, but when we lower to LLVM IR with debug assertions disabled there should be at most a zombie
br bbX
.
Well, it looks like a lot more here with -Cno-prepopulate-passes
: https://rust.godbolt.org/z/n8reE6nzG
But yes, I would still hope that LLVM could chew through it, since it does with other CPUs. AFAICS our IR does not change from the target-cpu, apart from the expected function attributes.
The IR size difference for <core::ops::index_range::IndexRange as core::slice::index::SliceIndex<[T]>>::get_unchecked_mut
is due to the use of the unchecked math functions in both library/core/src/slice/index.rs
and library/core/src/ops/index_range.rs
.
The IR for that link looks a lot better in nightly, I wonder if that's https://github.com/rust-lang/rust/pull/129283. https://github.com/rust-lang/rust/pull/126299 is also helping even if you just go to 1.81 I think.
Also the fact that there's a UB check in that IR at all is a bug. I'm looking into it.
I've locally added enough post-mono MIR optimizations to grind down the -Cno-prepopulate-passes
LLVM IR for the IndexRange as SliceIndex>::get_unchecked
to:
; <core::ops::index_range::IndexRange as core::slice::index::SliceIndex<[T]>>::get_unchecked_mut
; Function Attrs: inlinehint nonlazybind uwtable
define internal { ptr, i64 } @"_ZN104_$LT$core..ops..index_range..IndexRange$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$17get_unchecked_mut17hfdf5440029c89a21E"(i64 noundef %0, i64 noundef %1, ptr noundef %slice.0, i64 noundef %slice.1) unnamed_addr #0 {
start:
%self = alloca [16 x i8], align 8
store i64 %0, ptr %self, align 8
%2 = getelementptr inbounds i8, ptr %self, i64 8
store i64 %1, ptr %2, align 8
%offset = load i64, ptr %self, align 8, !noundef !3
%3 = getelementptr inbounds i8, ptr %self, i64 8
%self1 = load i64, ptr %3, align 8, !noundef !3
%len = sub nuw i64 %self1, %offset
%ptr = getelementptr inbounds i64, ptr %slice.0, i64 %offset
%4 = insertvalue { ptr, i64 } poison, ptr %ptr, 0
%5 = insertvalue { ptr, i64 } %4, i64 %len, 1
ret { ptr, i64 } %5
}
As far as I can tell, this is at least as good input to LLVM as we provided on 1.79, but we still have a missed optimization with -Ctarget-cpu=x86-64-v3
.
I've also tried your branch. I cannot even find this function in the LLVM IR, because in your branch it always gets inlined in MIR, which is basically what I was speculating originally about LLVM relying on MIR inlining.
So I think there's a deeper problem here with LLVM on this target-cpu
, and this change to the standard library is papering over the reproducer we have for it.
Code
I tried this test with
-Ctarget-cpu=x86-64-v3
(which we have on by default in the upcoming RHEL 10):tests/codegen/issues/issue-101082.rs
I expected to see this happen:
FileCheck
passInstead, this happened:
As of
rustc 1.83.0-nightly (52fd99839 2024-10-10)
, that LLVM IR is:Reducing to
x86-64-v2
does get the expected output:Version it worked on
It most recently worked on: Rust 1.79.0
Version with regression
rustc --version --verbose
:Note that the original issue #101082 was fixed by an LLVM upgrade. That version didn't change between 1.79.0 and 1.80.0, but there were some additional cherry-picks: https://github.com/rust-lang/llvm-project/compare/rustc-1.79.0...rustc-1.80.0
However,
cargo-bisect-rustc
narrowed down to something else.Bisection
searched nightlies: from nightly-2024-04-28 to nightly-2024-10-11 regressed nightly: nightly-2024-05-26 searched commit range: https://github.com/rust-lang/rust/compare/36153f1a4e3162f0a143c7b3e468ecb3beb0008e...1ba35e9bb44d416fc2ebf897855454258b650b01 regressed commit: https://github.com/rust-lang/rust/commit/48f00110d0dae38b3046a9ac05d20ea321fd6637 (#121571)
bisected with cargo-bisect-rustc v0.6.9
Host triple: x86_64-unknown-linux-gnu Reproduce with: ```bash cargo bisect-rustc --script test.sh --start 1.79.0 ``` ```sh #!/bin/sh rustc --emit=llvm-ir -O -Ctarget-cpu=x86-64-v3 -o- issue-101082.rs | FileCheck issue-101082.rs ```@rustbot modify labels: +A-codegen +regression-from-stable-to-stable -regression-untriaged