Several Mips, X86 CodeGen tests FAIL on Solaris/sparcv9

rorth commented 4 years ago


Bugzilla Link	47731
Resolution	FIXED
Resolved on	Nov 09, 2020 06:37
Version	trunk
OS	Solaris
CC	@topperc,@efriedma-quic,@RKSimon,@rotateright

Extended Description

Several Mips and X86 CodeGen tests FAIL on Solaris/sparcv9:

LLVM :: CodeGen/Mips/cconv/vector.ll LLVM :: CodeGen/Mips/msa/basic_operations.ll LLVM :: CodeGen/Mips/msa/i5-b.ll LLVM :: CodeGen/Mips/msa/immediates.ll LLVM :: CodeGen/X86/fp-intrinsics.ll LLVM :: CodeGen/X86/known-signbits-vector.ll LLVM :: CodeGen/X86/vec_shift5.ll LLVM :: CodeGen/X86/vector-shift-ashr-128.ll LLVM :: CodeGen/X86/vector-shift-ashr-256.ll

This only happens in 2-stage Release and RelWithDebInfo builds, 2-stage Debug builds and 1-stage Release builds with gcc are fine.

Sometimes the differences are quite small, e.g. for CodeGen/Mips/cconv/vector.ll:

--- vector.s.amd64 2020-10-05 18:21:13.102487784 +0000 +++ vector.s.sparcv9 2020-10-05 18:21:05.775611488 +0000 @@ -1548,7 +1548,7 @@ sw $ra, 36($sp) # 4-byte Folded Spill .cfi_offset 31, -4 lui $1, 6

ori $1, $1, 7
ori $1, $1, 10 lui $2, 9 ori $2, $2, 10 fill.w $w0, $2

In other cases the differences are quite substantial. I have no idea where even to start investigating this, so I'll need much guidance here.

rorth commented 3 years ago

Agreed - if we now have suitable buildbot test coverage and its currently green, then we can move on. Thanks!

We're not green yet, unfortunately. The two important offenders are PRs 47729 (another miscompilation) and 42493 (not supporting 128-bit long double on 32-bit SPARC).

I'm slowly working on the latter, but have no real grip on the former yet.

RKSimon commented 3 years ago

I guess it's a waste of time to run a full reghunt to identify the patch that actually fixed this.

Agreed - if we now have suitable buildbot test coverage and its currently green, then we can move on. Thanks!

rorth commented 3 years ago

Its unlikely any of my patches affected this - I'm wondering if 7ba3293691beb is the cause.

I had seen that one, too, but concluded that it was more about Windows EH and looked elsewhere. I've now run a full build at top-of-trunk with just that patch revered: the failures are still gone.

I guess it's a waste of time to run a full reghunt to identify the patch that actually fixed this.

RKSimon commented 3 years ago

Its unlikely any of my patches affected this - I'm wondering if 7ba3293691beb is the cause.

rorth commented 3 years ago

It looks like StackSlotColoring has one other possibly relevant flag: you could try "-ssc-dce-limit=0". I think that will remove the diff noted in the comment.

Beyond that, I think you could add a mechanism to limit the number of stack slots that are colored by StackSlotColoring, and bisect using that?

Thinking about it a bit more, I have another theory for what might be going wrong: some code could be using uninitialized data, and we're getting lucky/unlucky. This can be nasty to diagnose... one trick I've used in the past is hacking the assembly to memset the stack to zero before returning.

I didn't get back to this in a while, but recently found that the bug has gone away. Looing at the new Linux/sparc64 buildbot (http://lab.llvm.org:8014/#/builders/134) which unlike my Solaris/sparcv9 one runs 2-stage builds), one can see that the failures went away betweens builds 38 and 40. Given the large number of changes included in those two, it's difficult to tell exactly which patch fixed the issue, but it seems plausible that one of Simon's did.

Given that the failures haven't resurfaced since, I guess we can safely close this PR.

I can't help but feel that this is a total waste of my time: this is a spare-time activity and the result won't even be seen in the Solaris/sparcv9 buildbot which is native-only. I think this is something for some SPARC maintainer to look into.

There is no SPARC maintainer, really; it's just various people keeping it limping in their spare time. So if you don't look, nobody is going to look. (The primary reason we haven't just killed off the backend is that SPARC is widely used in academia.)

I wasn't aware of that, given that the the llvm/CODE_OWNERS.TXT still lists a SPARC maintainer. Fortunately, the GCC situation is different here: we do have an active SPARC maintainer and Oracle contributed several large patches to support newer SPARC CPUs in recent years.

efriedma-quic commented 4 years ago

It looks like StackSlotColoring has one other possibly relevant flag: you could try "-ssc-dce-limit=0". I think that will remove the diff noted in the comment.

Beyond that, I think you could add a mechanism to limit the number of stack slots that are colored by StackSlotColoring, and bisect using that?

Thinking about it a bit more, I have another theory for what might be going wrong: some code could be using uninitialized data, and we're getting lucky/unlucky. This can be nasty to diagnose... one trick I've used in the past is hacking the assembly to memset the stack to zero before returning.

I can't help but feel that this is a total waste of my time: this is a spare-time activity and the result won't even be seen in the Solaris/sparcv9 buildbot which is native-only. I think this is something for some SPARC maintainer to look into.

There is no SPARC maintainer, really; it's just various people keeping it limping in their spare time. So if you don't look, nobody is going to look. (The primary reason we haven't just killed off the backend is that SPARC is widely used in academia.)

I'm trying very hard not to just update the code and get the tests passing again :-)

I don't think trying to work around a SPARC backend miscompile is a good use of anyone's time.

RKSimon commented 4 years ago

I'm trying very hard not to just update the code and get the tests passing again :-)

Eli - any suggestions for the next step?

rorth commented 4 years ago

Rainer - do the additional tests I added to APIntTest pass?

They do.

The only other places I can see the .lshr(32).trunc(32) pattern are:

ARMTargetLowering::LowerConstantFP RISCVTargetLowering::PerformDAGCombine

Are they passing?

They pass as well: the only non-native tests currently FAILing are the ones in this PR.

RKSimon commented 4 years ago

Rainer - do the additional tests I added to APIntTest pass?

The only other places I can see the .lshr(32).trunc(32) pattern are:

ARMTargetLowering::LowerConstantFP RISCVTargetLowering::PerformDAGCombine

Are they passing?

rorth commented 4 years ago

assembly output for miscompiled getConstantVector

rorth commented 4 years ago

Stack slot coloring, really? Hmm. I can't think of any way a target could misbehave to make that cause a miscompile.

Don't hit the messenger ;-) I'm just reporting what I found, and it's consistent for all miscompilations seen.

Maybe try diffing the assembly of one of the functions with/without stackslotcoloring? getConstVector seems like the smallest of those three.

That's what I tried (I'm attaching both versions of the assembly for reference). However, at first blush the changes seem benign (just stack slot renumbering, though I didn't check in detail that there are no overlaps, e.g.). Once change seems strange at first:

@@ -515,9 +515,8 @@ stx %g0, [%fp+1487] .LBB603_46: ! %if.else.i6.i ! in Loop: Header=BB603_10 Depth=1

ld [%fp+1487], %g2 ba .LBB603_48
stx %g2, [%fp+1487]
nop .LBB603_47: ! %if.end7.i.i ! in Loop: Header=BB603_10 Depth=1 add %fp, 1487, %o0 @@ -526,9 +525,9 @@ .LBB603_48: ! %llvm::APInt::lshr(unsigned int) const [clone .exit]

but the whole operation is a no-op AFAICS: first 64-bit 0 is stored in the stack slot, then 32-bit loaded into %g2 and stored (as 64-bit, but still 0, of course) back into the same slot.

I've also tried to run both versions of llc side by side in gdb, but that got me nowhere.

There's also an option "-no-stack-slot-sharing" you could try experimenting with.

Tried that, too, but it didn't make a difference in any of the three cases.

I can't help but feel that this is a total waste of my time: this is a spare-time activity and the result won't even be seen in the Solaris/sparcv9 buildbot which is native-only. I think this is something for some SPARC maintainer to look into.

efriedma-quic commented 4 years ago

Stack slot coloring, really? Hmm. I can't think of any way a target could misbehave to make that cause a miscompile.

Maybe try diffing the assembly of one of the functions with/without stackslotcoloring? getConstVector seems like the smallest of those three. There's also an option "-no-stack-slot-sharing" you could try experimenting with.

rorth commented 4 years ago

Both variants alike fix the same four tests

LLVM :: CodeGen/X86/known-signbits-vector.ll
LLVM :: CodeGen/X86/vec_shift5.ll
LLVM :: CodeGen/X86/vector-shift-ashr-128.ll
LLVM :: CodeGen/X86/vector-shift-ashr-256.ll

without any other effect on test results.

RKSimon commented 4 years ago

And please also try this variant as well:

  Ops.push_back(DAG.getConstant(V.extractBits(32, 0), dl, EltVT));
  Ops.push_back(DAG.getConstant(V.extractBits(32, 32), dl, EltVT));

RKSimon commented 4 years ago

Following Eli's suggestion, I've now identified the passes that cause the miscompilations:

X86ISelLowering.cpp: getConstVector(llvm::ArrayRef, llvm::APInt&, llvm::MVT, llvm::SelectionDAG&, llvm::SDLoc const&)

This makes even more suspicious of the APInt operations.

Please can you try replacing

  Ops.push_back(DAG.getConstant(V.trunc(32), dl, EltVT));
  Ops.push_back(DAG.getConstant(V.lshr(32).trunc(32), dl, EltVT));

with:

  Ops.push_back(DAG.getConstant(V.extractBitsAsZExtValue(32, 0), dl, EltVT));
  Ops.push_back(DAG.getConstant(V.extractBitsAsZExtValue(32, 32), dl, EltVT));

And see if that makes a difference?

rorth commented 4 years ago

Following Eli's suggestion, I've now identified the passes that cause the miscompilations:

MipsSEISelDAGToDAG.cpp:

BISECT: running pass (32328) Stack Slot Coloring on function (_ZN4llvm18MipsSEDAGToDAGISel9trySelectEPNS_6SDNodeE)

llvm::MipsSEDAGToDAGISel::trySelect(llvm::SDNode*)
LegalizeDAG.cpp:

BISECT: running pass (51758) Stack Slot Coloring on function (_ZN12_GLOBAL__N_120SelectionDAGLegalize10LegalizeOpEPN4llvm6SDNodeE)

(anonymous namespace)::SelectionDAGLegalize::LegalizeOp(llvm::SDNode*)
X86ISelLowering.cpp:

BISECT: running pass (301313) Stack Slot Coloring on function (_ZL14getConstVectorN4llvm8ArrayRefINS_5APIntEEERS1_NS_3MVTERNS_12SelectionDAGERKNS_5SDLocE)

getConstVector(llvm::ArrayRef, llvm::APInt&, llvm::MVT, llvm::SelectionDAG&, llvm::SDLoc const&)

RKSimon commented 4 years ago

A common code pattern I'm seeing is

APInt x; // 64 bits unsigned lo32 = x.trunc(32); unsigned hi32 = x.lshr(32).trunc(32);

I've added some extra unit tests for these at rG3cb8347c94a0, its unlikely that these will fail but if you can confirm that'd be great.

rorth commented 4 years ago

SelectionDAGLegalize::OptimizeFloatStore is a highly likely candidate to check here

Thanks, I'll have a look.

Meanwhile I've also identified the objects that cause the remaining failures:

MipsSEISelDAGToDAG.cpp.o in libLLVMMipsCodeGen.a is responsible for the Mips ones.
X86ISelLowering.cpp.o in libLLVMX86CodeGen.a causes the remaining X86 failures.

RKSimon commented 4 years ago

As a first step, I've found that replacing LegalizeDAG.cpp.o in libLLVMSelectionDAG.a with the gcc-compiled version fixes the X86/fp-intrinsics.ll test, but none of the others, neither X86 nor Mips.

SelectionDAGLegalize::OptimizeFloatStore is a highly likely candidate to check here

rorth commented 4 years ago

Created attachment 24021 [details] Diffs to affected llvm/test/CodeGen/X86 .ll files

Thanks - on x86 it looks like its always something wrong on i686 triples for i64/vXi64 constant handling - so maybe a APInt method? I'm just guessing here - and won't be able to help narrow it down straight away.

sparcv9 is bigendian yes? It could be something to do with that I suppose but I'm not sure on that.

It is indeed, but it's not the only bigendian system supported by LLVM.

As a first step, I've found that replacing LegalizeDAG.cpp.o in libLLVMSelectionDAG.a with the gcc-compiled version fixes the X86/fp-intrinsics.ll test, but none of the others, neither X86 nor Mips.

I'll have to repeat the bisect for the other failing tests to get a better picture.

RKSimon commented 4 years ago

Created attachment 24021 [details] Diffs to affected llvm/test/CodeGen/X86 .ll files

Thanks - on x86 it looks like its always something wrong on i686 triples for i64/vXi64 constant handling - so maybe a APInt method? I'm just guessing here - and won't be able to help narrow it down straight away.

sparcv9 is bigendian yes? It could be something to do with that I suppose but I'm not sure on that.

--- a/llvm/test/CodeGen/X86/known-signbits-vector.ll +++ b/llvm/test/CodeGen/X86/known-signbits-vector.ll @@ -68,7 +68,7 @@ define <4 x double> @signbits_ashr_sitof ; X86-NEXT: vpsrlq $34, %xmm0, %xmm2 ; X86-NEXT: vpsrlq $33, %xmm0, %xmm0 ; X86-NEXT: vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7] -; X86-NEXT: vmovdqa {{.#+}} xmm2 = [1073741824,0,536870912,0] +; X86-NEXT: vmovdqa {{.#+}} xmm2 = [1073741824,1073741824,536870912,0] ; X86-NEXT: vpxor %xmm2, %xmm0, %xmm0 ; X86-NEXT: vpsubq %xmm2, %xmm0, %xmm0 ; X86-NEXT: vshufps {{.#+}} xmm0 = xmm0[0,2],xmm1[0,2] @@ -376,7 +376,7 @@ define float @signbits_ashr_sextvecinreg ; X86-NEXT: vpsrlq $60, %xmm0, %xmm2 ; X86-NEXT: vpsrlq $61, %xmm0, %xmm0 ; X86-NEXT: vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7] -; X86-NEXT: vmovdqa {{.#+}} xmm2 = [4,0,8,0] +; X86-NEXT: vmovdqa {{.*#+}} xmm2 = [4,4,8,0] ; X86-NEXT: vpxor %xmm2, %xmm0, %xmm0 ; X86-NEXT: vpsubq %xmm2, %xmm0, %xmm0 ; X86-NEXT: vpand %xmm1, %xmm0, %xmm2

--- a/llvm/test/CodeGen/X86/vector-shift-ashr-256.ll +++ b/llvm/test/CodeGen/X86/vector-shift-ashr-256.ll @@ -1072,7 +1072,7 @@ define <4 x i64> @constant_shift_v4i64(< ; X32-AVX1-NEXT: vpsrlq $7, %xmm0, %xmm2 ; X32-AVX1-NEXT: vpsrlq $1, %xmm0, %xmm0 ; X32-AVX1-NEXT: vpblendw {{.#+}} xmm0 = xmm0[0,1,2,3],xmm2[4,5,6,7] -; X32-AVX1-NEXT: vmovdqa {{.#+}} xmm2 = [0,1073741824,0,16777216] +; X32-AVX1-NEXT: vmovdqa {{.*#+}} xmm2 = [0,0,0,16777216] ; X32-AVX1-NEXT: vpxor %xmm2, %xmm0, %xmm0 ; X32-AVX1-NEXT: vpsubq %xmm2, %xmm0, %xmm0 ; X32-AVX1-NEXT: vinsertf128 $1, %xmm1, %ymm0, %ymm0

rorth commented 4 years ago

Diffs to affected llvm/test/CodeGen/X86 .ll files

rorth commented 4 years ago

Please can you list the specific test functions within each CodeGen/X86* test file that breaks and I'll take a look.

That would have been easy e.g. for fp-intrinsics.ll (f19), but ...

Or you could use the update script:

llvm-project\llvm\utils\update_llc_test_checks.py

to regenerate those files and attach the diffs.

... this is way easier and the diffs are much smaller than those of the assembler output.

RKSimon commented 4 years ago

Please can you list the specific test functions within each CodeGen/X86* test file that breaks and I'll take a look.

Or you could use the update script:

llvm-project\llvm\utils\update_llc_test_checks.py

to regenerate those files and attach the diffs.

efriedma-quic commented 4 years ago

Standard tips for debugging miscompiles: bisect object files to find the specific object file containing the bug, use https://llvm.org/docs/OptBisect.html to narrow down from there.

(The fact that the program being miscompiled is the compiler itself doesn't change the general process.)

llvm / llvm-project

Several Mips, X86 CodeGen tests FAIL on Solaris/sparcv9 #47075

Extended Description