Open patrick-rivos opened 9 months ago
I think this is some kind of bug in our gep lowering.
This:
%0 = tail call <4 x i32> @llvm.masked.gather.v4i32.v4p0(<4 x ptr> <ptr getelementptr inbounds ([8 x [6 x i32]], ptr @b, i64 0, i64 2), ptr getelementptr inbounds ([8 x [6 x i32]], ptr @b, i64 0, i64 3), ptr getelementptr inbounds ([8 x [6 x i32]], ptr @b, i64 0, i64 4), ptr getelementptr inbounds ([8 x [6 x i32]], ptr @b, i64 0, i64 5)>, i32 4, <4 x i1> <i1 true, i1 true, i1 true, i1 true>, <4 x i32> poison), !tbaa !7
Is becoming this after initial SDAG construction:
t3: v4i1 = BUILD_VECTOR Constant:i1<-1>, Constant:i1<-1>, Constant:i1<-1>, Constant:i1<-1>
t7: i64 = add nuw GlobalAddress:i64<ptr @b> 0, Constant:i64<48>
t9: i64 = add nuw GlobalAddress:i64<ptr @b> 0, Constant:i64<72>
t11: i64 = add nuw GlobalAddress:i64<ptr @b> 0, Constant:i64<96>
t13: i64 = add nuw GlobalAddress:i64<ptr @b> 0, Constant:i64<120>
t14: v4i64 = BUILD_VECTOR t7, t9, t11, t13
t16: v4i32,ch = masked_gather<(load unknown-size, align 4, !tbaa !7), signed unscaled offset> t0, undef:v4i32, t3, Constant:i64<0>
Unless I'm missing something here, those offsets are wrong. ptr getelementptr inbounds ([8 x [6 x i32]], ptr @b, i64 0, i64 2)
should be @b + 8 bytes not @b + 48 bytes. It almost seems like we're scaling by the inner array size (4 x 6 = 24) here.
I'm a bit distrustful of my own claim here. Am I misreading something? This seems too badly broken to have survived in tree for long?
I think 48 is correct. We have a global variable pointing to an array of arrays. The first 0 index has to step over the global variable. https://llvm.org/docs/GetElementPtr.html#why-is-the-extra-0-index-required The second index indexes into the outer array which has [6 x i32] elements.
I think 48 is correct. We have a global variable pointing to an array of arrays. The first 0 index has to step over the global variable. https://llvm.org/docs/GetElementPtr.html#why-is-the-extra-0-index-required The second index indexes into the outer array which has [6 x i32] elements.
You're right, I was forgetting we had to index "in to" the global. What confused me was that the pointer type being returned was a pointer to the inner array, and there's an implicit [0] from the source code. Another case where pointer types just confuse things. :)
Back to square zero, my analysis above is completely off.
I'm unable to reproduce this with the prebuilt copies of qemu I have around. So I don't know if I'm doing something wrong or there's some qemu bug we have fixed.
I used qemu v8.1.2. I can try again with a more tip-of-tree QEMU.
I tried with both v8.2.1 and tip-of-tree QEMU and could reproduce the failure on both.
I can't reproduce the error, but I've looked at the opt-bisect right. It looks like the only difference in the final assembly is the p2align directive on b
?
Yep that's what I'm seeing too.
Fails with .p2align 2, 0x0
but passes with .p2align 4, 0x0
> /scratch/tc-testing/llvm-feb-5/build/bin/clang exit-code-0-SROAPass.S -o user-config.out
> QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0,Zve32f=true,Zve64f=true /scratch/tc-testing/llvm-feb-5/build/bin/qemu-riscv64 user-config.out
> echo $?
0
> /scratch/tc-testing/llvm-feb-5/build/bin/clang exit-code-1-InferAlignmentPass.S -o user-config.out
> QEMU_CPU=rv64,vlen=128,v=true,vext_spec=v1.0,Zve32f=true,Zve64f=true /scratch/tc-testing/llvm-feb-5/build/bin/qemu-riscv64 user-config.out
> echo $?
1
An interesting datapoint
$ CC -fuse-ld=lld repo-O2-patched.s -o repo-O2-patched
$ qemu-riscv64 repo-O2-patched
$ echo $?
0
$CC -fuse-ld=ld repo-O2-patched.s -o repo-O2-patched
$ qemu-riscv64 repo-O2-patched
$ echo $?
1
repo-O2-patched.s is the .s output of O2 per the above reproducer with the alignment of b reduced to 4.
The LLD in question is a somewhat recently build upstream LLD copy. The LD is some downstream fork of uncertain heritage. It's entirely possible it's a downstream LD bug.
Here's another data point.
repo-O0-ld
0
repo-O0-lld
0
repo-O1-ld
0
repo-O1-lld
0
repo-O2-ld
1
repo-O2-lld
0
repo-O3-ld
1
repo-O3-lld
0
repo.sh is:
for LEVEL in "O0" "O1" "O2" "O3"
do
for LINKER in "ld" "lld"
do
STEM="repo-$LEVEL-$LINKER"
echo $STEM
$CC -fuse-ld=$LINKER -$LEVEL repo.c -o $STEM
$CC -$LEVEL repo.c -o $STEM.s -S
$CC -$LEVEL repo.c -o $STEM.ll -S -emit-llvm
qemu-riscv64 $STEM
echo $?
done
done
exit 0
That paints a pretty clear story that it's only the high alignment on LD which causes the wrong result here. This could be a linker bug (possibly downstream!) or an odd interaction with relocation semantics in some non-obvious (to me) way.
Testcase:
Commands:
Bugpoint/asm: bugpoint-and-asm.zip
opt-bisect-limit points to
InferAlignmentPass
Discovered/tested using a7bc9cb6ffa91ff0ebabc45c0c7263c7c2c3a4de (not bisected) Found using fuzzer.