Open llvmbot opened 7 years ago
For reference, r283190 caused Bug #31127, so if anyone backports this commit they will also need to backport r287765.
I just ran git bisect and r283190 was the commit that fixed these tests. It's not clear if this commit actually fixed the test or just hid the bug by changing the scheduling order. However, it seems plausible that these failures were caused by a bug sign-extending part words.
I'm attaching the good/bad assembly and the LLVM IR for reference.
Just to clarify:
glsl-kwin-blur-1 and glsl-kwin-blur-2 (still) FAIL with VSX code gen enabled and PASS with VSX code gen DISabled.
I built LLVM 3.9 with help from Tom, installed it, and linked a new build of Mesa against it.
Of the four Piglit tests I originally identified as failing with VSX code gen enabled but passing with VSX code gen DISabled:
Test: spec@arb_vertex_attrib_64bit@execution@vs_in@vs-input-float_mat3x4-position-double_dmat4x3 Command: bin/shader_runner /root/piglit/generated_tests/spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-float_mat3x4-position-double_dmat4x3.shader_test -auto -fbo
Test: spec@arb_vertex_attrib_64bit@execution@vs_in@vs-input-position-double_dmat3-float_mat2_array3 Command: bin/shader_runner /root/piglit/generated_tests/spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-position-double_dmat3-float_mat2_array3.shader_test -auto -fbo
Test: shaders@glsl-kwin-blur-1 Command: bin/glsl-kwin-blur-1 -auto -fbo
Test: shaders@glsl-kwin-blur-2 Command: bin/glsl-kwin-blur-2 -auto -fbo
the first two now PASS with VSX code gen enabled.
The remaining two, glsl-kwin-blur-1 and glsl-kwin-blur-2, still fail, so the next thing I'll try will be to use the controls Tom described in Comment 8.
I just realized that the patch meant to fix AADB got stuck in review many moons ago and was effectively abandoned. https://reviews.llvm.org/D20949
I think that if these test cases pass with AADB turned off, we might want to revive this patch.
Actually, that's probably not the patch I was thinking of. That one just adds more opportunities for AADB. But I still think it'd be interesting to see if disabling it fixes the problem.
Also, we realized recently that there was a problem in one of the passes not handling aliasing correctly which ended up reordering loads/stores in the scheduler incorrectly. That was fixed in: https://reviews.llvm.org/rL309651 https://reviews.llvm.org/rL309849
So hopefully as Tom points out, this may not reproduce on trunk.
I just realized that the patch meant to fix AADB got stuck in review many moons ago and was effectively abandoned. https://reviews.llvm.org/D20949
I think that if these test cases pass with AADB turned off, we might want to revive this patch.
Created attachment 19118 [details] Mesa patch to disable MachineScheduler for ppc64le
I was able to get the tests working by disabling the MachineScheduler on ppc64le.
I'm still not sure what the root causes is, I'm going to try to see if this is still an issue with trunk.
Hmm, machine scheduler... I wonder if this is an Anti-dep breaker bug that we saw a while ago where it doesn't handle subregisters correctly. Do the tests pass if you disable AADB? Although I don't see a way to disable it without modifying: PPCGenSubtargetInfo::AntiDepBreakMode PPCSubtarget::getAntiDepBreakMode()
to return TargetSubtargetInfo::ANTIDEP_NONE. Mind you I didn't take a super close look at the AADB code.
Mesa patch to disable MachineScheduler for ppc64le I was able to get the tests working by disabling the MachineScheduler on ppc64le.
I'm still not sure what the root causes is, I'm going to try to see if this is still an issue with trunk.
Debugging patch for PPC You could try disabling a few other features via MAttrs: -direct-move -power8-vector
Make sure to leave +vsx enabled when you try these. You won't need to patch LLVM to test these.
In addition, you can apply this patch which adds the following debugging flags to disable some optimizations:
-disable-ppc-lower-vector-shuffle-vsx -disable-ppc-vsx-swaps
These need to be set using LLVMParseCommandLineOptions()
I've isolated the particular shader program as Tom suggested; the vertex shader (draw_llvm_vs_variant0) is, in fact, the culprit.
Can you isolate which shader has the bug? Maybe by disabling vsx for one shader at a time?
Debug output from Mesa, including LLVM IR Here is the output from a run of Piglit's shader_runner program with the following debug environmental controls in place:
LIBGL_DEBUG=verbose MESA_DEBUG=verbose MESA_GLSL=dump log uniform useprog errors GALLIVM_DEBUG=tgsi ir asm dumpbc ST_DEBUG=tgsi
The program is
Can you post the LLVM IR for the smallest of the failing tests?
From the Mesa side, the vertex shader program gets run by code that looks like this in
Other Piglit tests that pass with VSX code generation disabled:
Test: spec@arb_vertex_attrib_64bit@execution@vs_in@vs-input-position-double_dmat3-float_mat2_array3 Command: bin/shader_runner /root/piglit/generated_tests/spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-position-double_dmat3-float_mat2_array3.shader_test -auto -fbo
Test: shaders@glsl-kwin-blur-1 Command: bin/glsl-kwin-blur-1 -auto -fbo
Test: shaders@glsl-kwin-blur-2 Command: bin/glsl-kwin-blur-2 -auto -fbo
Extended Description
Overview:
On PPC64LE, some of the Piglit tests fail because erroneous VSX code is generated for the vertex shader programs; the same tests pass again if VSX code generation is disabled (i.e. if just Altivec is used).
Steps to reproduce:
ON YOUR PPC64LE TEST SYSTEM: 0) dnf install mesa-dri-drivers glx-utils redhat-rpm-config 1) Download Mesa 17.1.2 (recent release): git clone -b mesa-17.1.2 git://anongit.freedesktop.org/mesa/mesa 2) dnf builddep mesa 3) Build Mesa: in Mesa directory: a) RECOMMEND export CFLAGS="-g -O0" ; eqxport CXXFLAGS="-g -O0" b) prefix=<where you want to install Mesa, e.g. $HOME/local/lib> c) ./autogen.sh --prefix=$prefix \ --with-dri-drivers= \ --with-gallium-drivers=swrast \ --enable-gallium-llvm \ --enable-glx-tls \ --enable-debug d) apply the patch to lp_bld_misc.cpp described below e) make f) make install g) make sure your LD_LIBRARY_PATH points to where 'make install' installed, i.e. to $prefix 4) Back in home directory, download Piglit: git clone git://anongit.freedesktop.org/piglit 5) Install Piglit build dependencies: dnf builddep piglit; IF problems here, manually install
ON YOUR PC running your browser, where you should also have Piglit installed: 9) Create Piglit result directory; recommend $HOME/piglit-results 10) Copy the q.ppc64le.mesa-17.1.2 directory over via SCP or other means 11) Create browser-viewable summary: a) in Piglit directory... b) export RESULTDIR=$HOME/piglit-results c) export RESULTNAME=q.ppc64le.mesa-17.1.2 d) export SUMMARYNAME=sq.ppc64le.mesa-17.1.2 e) ./piglit summary html $RESULTDIR/$SUMMARYNAME $RESULTDIR/$RESULTNAME
12) Open the summary in a browser tab: a) ^O to open a file; b) select .../piglit-results/sq.ppc64le.mesa-17.1.2/index.html c) narrow the view by selecting "problems" d) find the two tests under arb_vertex_attrib_64bit; click on "fail" in the rightmost column e) Note multiple color discrepancies: Expected 0 255 0 255, Observed 0 0 255 0
Results: See 12(e) immediately above
Expected Results: No color discrepancy messages; "pass"
Additional info:
Installed on my PPC64LE system: llvm.ppc64le 3.9.1-2.fc25 @updates
llvm-devel.ppc64le 3.9.1-2.fc25 @updates
llvm-libs.ppc64le 3.9.1-2.fc25 @updates
Commands to run the failing tests (in Piglit dir):
bin/shader_runner \ $HOME/piglit/generated_tests/spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-float_mat3x4-position-double_dmat4x3.shader_test -auto -fbo
bin/shader_runner \ $HOME/piglit/generated_tests/spec/arb_vertex_attrib_64bit/execution/vs_in/vs-input-position-double_dmat3-float_mat2_array3.shader_test -auto -fbo
Enabling/disabling Mesa debug output:
export LIBGL_DEBUG=verbose export MESA_DEBUG=verbose export MESA_GLSL="" export MESA_GLSL="dump log uniform useprog errors" export GALLIVM_DEBUG="tgsi ir asm dumpbc" export ST_DEBUG=tgsi
These are very useful when you're debugging Mesa; you will see the shader program source at several levels, including LLVM IR. BUT they must all be blank when you're running Piglit, otherwise the Piglit run will take a VERY long time and the 'summary' step will not work.
THE TESTS CAN BE MADE TO PASS BY DISABLING VSX code generation; I added the following patch to lp_bld_misc.cpp to gain finer control over +VSX/-VSX code generation:
--- a/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp +++ b/src/gallium/auxiliary/gallivm/lp_bld_misc.cpp @@ -624,7 +624,10 @@ lp_build_create_jit_compiler_for_module(LLVMExecutionEngineRef *OutJIT,
endif
endif