Closed cbeuw closed 1 year ago
I don't think your reduction is correct; it looks like it involves accessing zero-byte allocations.
Generally, the first tool I reach for to reduce miscompiles is opt-bisect-limit (https://llvm.org/docs/OptBisect.html).
I removed all the zero-byte alloca
s: https://godbolt.org/z/jEbPc1P94
Looks like there is an ABI mismatch. The arguments are pushed via pushq at 8 byte offsets and then read via movl at 4 byte offsets.
Here's a reduction:
define void @caller() nounwind {
call void @callee(ptr null, ptr null, ptr null, ptr null, ptr null, ptr null, <7 x i32> <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>)
ret void
}
define void @callee(ptr %p0, ptr %p1, ptr %p2, ptr %p3, ptr %p4, ptr %p5, <7 x i32> %arg) nounwind {
start:
%alloca = alloca [7 x i32], align 4
store <7 x i32> %arg, ptr %alloca, align 4
%extract0 = extractelement <7 x i32> %arg, i64 0
call void @use(i32 %extract0)
%extract1 = extractelement <7 x i32> %arg, i64 1
call void @use(i32 %extract1)
%extract2 = extractelement <7 x i32> %arg, i64 2
call void @use(i32 %extract2)
%extract3 = extractelement <7 x i32> %arg, i64 3
call void @use(i32 %extract3)
%extract4 = extractelement <7 x i32> %arg, i64 4
call void @use(i32 %extract4)
%extract5 = extractelement <7 x i32> %arg, i64 5
call void @use(i32 %extract5)
%extract6 = extractelement <7 x i32> %arg, i64 6
call void @use(i32 %extract6)
%extract7 = extractelement <7 x i32> %arg, i64 7
call void @use(i32 %extract7)
ret void
}
declare void @use(i32)
The caller does:
pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
pushq $42
callq callee@PLT
The callee does:
movl 112(%rsp), %ebx
movl 104(%rsp), %ebp
movl 96(%rsp), %r14d
movl 76(%rsp), %r15d
movl 72(%rsp), %r12d
movl 64(%rsp), %edi
movl 68(%rsp), %r13d
If we drop the store, then the offsets are correct (don't mind the different base):
movl 144(%rsp), %ebx
movl 136(%rsp), %ebp
movl 128(%rsp), %r14d
movl 120(%rsp), %r15d
movl 112(%rsp), %r12d
movl 104(%rsp), %r13d
movl 96(%rsp), %edi
So this is again in some way related to the arg copy elision optimization.
This seems to be related to the code in X86ISelLowering::LowerMemArgument() handling isCopyElisionCandidate(). It checks for ScalarizedAndExtendedVector, but does so by inspecting the size of the LocVT. However, if I'm understanding this right, in this case the LocVT is i32 matching the vector size, but this doesn't match the size of the stack slot, which is 8.
I'm not sure if there's any easy way to access that stack slot size though... CCAssignVal only stores the start offset.
Candidate patch: https://reviews.llvm.org/D154078
@llvm/issue-subscribers-backend-x86
@cbeuw Do you have the original Rust code that lead to this issue? I find it suspicious that we end up with illegal vector types in optimized IR -- unless you did something with repr(simd)
I don't think that's supposed to happen.
@nikic I have the unreduced code in custom MIR: https://godbolt.org/z/7q6q8eK96. But I don't have the reduced one around any more... I'm happy to run the minimisation script again though if needed.
This isn't reproducible from surface Rust, which is why I opened a bug report with LLVM directly. The reproduction required a Move
operand of an array local in a function call, where the same local was previously used. This MIR cannot be built from surface Rust as MIR building creates temporary copies for all Move
operands in Call
. The local that gets moved is assigned to and used exactly once. If you change Move(_16)
to _16
on line 3200 then the bug goes away.
By illegal vector types do you mean the zero-byte [0 x [0 x [0 x i8]]]
s? They weren't from rustc
, they were from llvm-reduce
. The IR from Rust was folded under original IR
in the OP.
"Illegal vector type" here refers to the non-power-of-two vectors, which are not natively supported by the target. They are already part of the input IR, and the most likely culprit for that is https://github.com/rust-lang/rust/pull/111999.
I wonder whether it would make sense to prevent argument promotion for such types, as the legalized argument passing for such vectors can be substantially worse than just passing them indirectly (and it makes it more likely to hit legalization bugs like https://github.com/llvm/llvm-project/issues/63608).
This should print
42 42 42 42 42 42 42
, but prints42 0 42 0 42 42 42
withclang
oropt -O3
https://godbolt.org/z/8v3d7enK8The above was from
llvm-reduce
. I don't know if it broke something so I attached the original IR below. This is compiled from Rust but I've patched out the symbols from Rust std so has no dependency on Rust.original IR
```llvm ; ModuleID = 'repro.46f743e1561fb24e-cgu.0' source_filename = "repro.46f743e1561fb24e-cgu.0" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" %Adt56 = type { { i128, i128, { i16, i128 }, i64, i32, [1 x i32] }, %Adt55 } %Adt55 = type { %Adt54 } %Adt54 = type { { i128, ptr }, { i128, i128, { i16, i128 }, i64, i32, [1 x i32] } } @vtable.0 = private unnamed_addr constant <{ ptr, [16 x i8], ptr, ptr, ptr }> <{ ptr @"_ZN4core3ptr85drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$17h0eee5ecdc5932091E", [16 x i8] c"\08\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00", ptr @"_ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17hf0de4a394f8e37a1E", ptr @"_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h991b85cf75f57f3aE", ptr @"_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h991b85cf75f57f3aE" }>, align 8 @alloc_a00f8a95864fc305bf508c11187211d8 = private unnamed_addr constant <{ [28 x i8] }> zeroinitializer, align 4 @alloc_4f40612ab7406a7d1f3f0640c8ea0fb4 = private unnamed_addr constant <{ [64 x i8] }> zeroinitializer, align 8 @alloc_ee0548ff1320ae5be168b83ab0b060cd = private unnamed_addr constant <{ [20 x i8] }> <{ [20 x i8] c"a\00\00\00a\00\00\00a\00\00\00a\00\00\00a\00\00\00" }>, align 4 ; std::sys_common::backtrace::__rust_begin_short_backtrace ; Function Attrs: noinline nonlazybind uwtable define internal fastcc void @_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3230664098c98715E(ptr nocapture noundef nonnull readonly %f) unnamed_addr #0 { start: tail call void %f() tail call void asm sideeffect "", "~{memory}"() #10, !srcloc !3 ret void } ; std::rt::lang_start::{{closure}} ; Function Attrs: inlinehint nonlazybind uwtable define internal noundef i32 @"_ZN3std2rt10lang_start28_$u7b$$u7b$closure$u7d$$u7d$17h991b85cf75f57f3aE"(ptr noalias nocapture noundef readonly align 8 dereferenceable(8) %_1) unnamed_addr #2 { start: %_4 = load ptr, ptr %_1, align 8, !nonnull !4, !noundef !4 ; call std::sys_common::backtrace::__rust_begin_short_backtrace tail call fastcc void @_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3230664098c98715E(ptr noundef nonnull %_4) ret i32 0 } ; core::ops::function::FnOnce::call_once{{vtable.shim}} ; Function Attrs: inlinehint nonlazybind uwtable define internal noundef i32 @"_ZN4core3ops8function6FnOnce40call_once$u7b$$u7b$vtable.shim$u7d$$u7d$17hf0de4a394f8e37a1E"(ptr nocapture noundef readonly %_1) unnamed_addr #2 personality ptr @rust_eh_personality { start: %0 = load ptr, ptr %_1, align 8, !nonnull !4, !noundef !4 ; call std::sys_common::backtrace::__rust_begin_short_backtrace tail call fastcc void @_ZN3std10sys_common9backtrace28__rust_begin_short_backtrace17h3230664098c98715E(ptr noundef nonnull %0), !noalias !5 ret i32 0 } ; core::ptr::drop_in_place