Open llvmbot opened 7 years ago
We can still reproduce this bugs with LLVM 12.0 using the LLPC pipeline compiler for AMD GPU.
Our profiles show the same pathological behavior in tryEvict
.
Instead of using the fast regalloc (which is super bad for code quality), you could give a try to basic.
Comparing release builds, here are the compile-time differences with and without Eric's patch:
ir_fs138_variant0.bc
The compile-time increase comes from greedy register allocation. The patch changes the instruction scheduling - as it is meant to, which unfortunately means that in this particular case, we produce a schedule that is particularly bad for Greedy RA. Here are a few technical details as I understand them from my investigation:
With this information in mind, I think we might have to consider this a limitation and close this PR. Considering MESA is a JIT, it may be worth while investigating the possibility of switching to the fast register allocator (like use option -regalloc=fast
). I assume that will produce less optimal register allocation, but is presumably faster than the near-optimal greedy register allocator. Let me know what you think about this.
My patch is merely a scheduler description change, at worst it's highlighting a performance problem somewhere else sadly.
I tested reverting the patch from Eric reported by Ben on clang branch 5.0. Just revert that patch reduce the Release version llc compile time for file ir_fs138_variant0.bc
by about 10% (the other two byte code see even smaller compile time difference). Which means just Eric's patch causes about 10% compile degradation on branch 5.0. But we did see about 3x-4x compile time difference for file ir_fs138_variant0.bc
if I revert everything including Eric's patch and after (from about 0.5 seconds before to about 2.0s seconds after). I will continue look at this issue. Meanwhile please use Release version llc for the compilation in future, since that's way faster than the Debug version llc(about 25x-45x faster).
I see I did not specify my exact build procedure; apologies!
Here it is:
In my LLVM directory, /tmp/llvm-bisect (i.e., on RAMdisk):
% cmake -G "Unix Makefiles" -DLLVM_BUILD_LLVM_DYLIB=ON -DCMAKE_INSTALL_PREFIX=/tmp/local /tmp/llvm-bisect
% make -j 144
I.e., I built with "gcc (GCC) 7.2.1 20170915 (Red Hat 7.2.1-2)", the system compiler; we are somewhat constrained to use GCC when building Mesa, LLVM, etc.
But I DID do Debug builds (i.e. let the build type default to Debug), so maybe that has something to do with the differences in our experiences.
Hi Ben, can you validate that you are not comparing a Release build llc with a Debug build llc? We know that the Debug build llc is significantly slower than the Release build llc. According to my tests, the Release build llc compile time for Fragment shader bytecode is always around 3 seconds with/without Eric's patch. While the Debug build llc compile time is almost 2 minutes with/without Eric's patch. Can you do the test again for both the Release and Debug build and post your detailed result here if you still believe there is a compile degradation? Thanks a lot!
I can reproduce this degradation. I'm not sure how you did your experiment Tony, but I get consistent run times around 0.5s before the first patch and 2.0s after it. We will continue investigating.
Hi Ben, I tried to compile all the three byte code files you put in the attachment on our PPC64LE dev machine. With/Without the first bad commit you mentioned (0ef3663fb81c9cd73f646728463a6105b5d9b88a) using the options you put in the comment (-mcpu=pwr8 -mattr=+altivec,+vsx
). There is no significant compile time difference for all the 3 byte code files. I run 10 times with/without that patch. Can you retry this problem against the latest trunk of clang/llvm and see whether you can still reproduce? I was just reverting the problematic patch you mentioned from Eric Christopher. If you can provide me with the git hash number for all the other three projects (clang/compiler-rt and test-suite) when you found the bad llvm commit (should have similar time stamp with the 0ef3663fb81c9cd73f646728463a6105b5d9b88a patch). I can revert all the projects to around the bad llvm commit time and test again to see whether I can reproduce. Thank you very much!
The following is one of my test results (there is no visible difference between different compile)
time `llc fragmentShader.bc -mcpu=pwr8 -mattr=+altivec,+vsx`
real 0m3.501s
user 0m3.491s
sys 0m0.008s
P.S. Note that I kept my LLVM build in /tmp
, i.e. on RAM disk,
so the only disk I/O involved was reading the bytecode file and
writing the assembly language output.
Bytecode used for bisect operation Hi Nemanja,
Sorry, I did not keep the compile time information for each of the individual bisect steps. HOWEVER, I CAN tell you that, before the problem commit, the compile time for the shader code was routinely in the 6-7 second range, while after the problem commit, the compile time was in the 37-45 second range.
BTW I've attached the specific bytecode file I used for the
bisect operation, ir_fs138_variant0.bc
.
Hi Ben, do you happen to have the compile times for the same shader code with each of the mentioned revisions? It would be good to see which one results in the largest jump. Then we can investigate why this results in such a large compile-time increase.
I did a bisect operation as requested by Nemanja, and here is the result (please pardon my use of git instead of SVN):
# first bad commit: [0ef3663fb81c9cd73f646728463a6105b5d9b88a] vec perm can go down either pipeline on P8. No observable changes, spotted while looking at the scheduling description.
This certainly looks suspicious, in light of the fact that the
change is in lib/Target/PowerPC/PPCScheduleP8.td
.
Here is the text of the commit in the context of the surrounding commits:
commit b89cc7e5e30432b6093664a44ee2e2af9a42f3b6
Author: Nirav Dave <niravd@google.com>
Date: Sun Feb 26 01:27:32 2017 +0000
Revert "In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled."
This reverts commit r296252 until 256-bit operations are more efficiently generated in X86.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296279 91177308-0d34-0410-b5e6-96231b3b80d8
commit 0ef3663fb81c9cd73f646728463a6105b5d9b88a
Author: Eric Christopher <echristo@gmail.com>
Date: Sun Feb 26 00:11:58 2017 +0000
vec perm can go down either pipeline on P8.
No observable changes, spotted while looking at the scheduling description.
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296277 91177308-0d34-0410-b5e6-96231b3b80d8
commit 3a603f41297cad31be9ce54e1c8c076c76c60ddf
Author: Sanjoy Das <sanjoy@playingwithpointers.com>
Date: Sat Feb 25 22:25:48 2017 +0000
Fix signed-unsigned comparison warning
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@296274 91177308-0d34-0410-b5e6-96231b3b80d8
Build time seems to be in RAGreedy (fragment shader):
llvm::MachineFunctionPass::runOnFunction 99.24 %
- `anonymous namespace'::RAGreedy::runOnMachineFunction 93.59 % 0.00 %
- llvm::RegAllocBase::allocatePhysRegs 93.52 % 0.00 %
- `anonymous namespace'::RAGreedy::selectOrSplit 92.44 % 0.00 %
- `anonymous namespace'::RAGreedy::selectOrSplitImpl 92.20 % 0.00 %
- `anonymous namespace'::RAGreedy::tryEvict 86.68 % 0.02 %
- `anonymous namespace'::RAGreedy::canEvictInterference 86.27 % 0.06 %
- `anonymous namespace'::RAGreedy::canReassign 80.64 % 0.35 %
- llvm::LiveIntervalUnion::Query::checkInterference 61.62 % 0.31 %
- llvm::LiveIntervalUnion::Query::collectInterferingVRegs 61.30 % 1.27 %
- llvm::IntervalMap<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::const_iterator::find 19.26 % 0.36 %
+ llvm::IntervalMapImpl::LeafNode<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::findFrom 7.71 % 0.20 %
+ llvm::IntervalMap<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::const_iterator::treeFind 5.70 % 0.05 %
+ llvm::IntervalMap<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::const_iterator::setRoot 3.64 % 0.15 %
+ llvm::IntervalMap<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::rootLeaf 0.99 % 0.28 %
+ llvm::IntervalMap<llvm::SlotIndex,llvm::LiveInterval * __ptr64,8,llvm::IntervalMapInfo<llvm::SlotIndex> >::const_iterator::branched 0.83 % 0.46 %
What were your triple/cpu settings?
Tom Stellard suggested I also supply the -mcp
and -mattr
options.
Here they are:
% llc -mcpu=pwr8 -mattr=+altivec,+vsx
What were your triple/cpu settings?
% llc --version LLVM (http://llvm.org/): LLVM version 6.0.0svn DEBUG build with assertions. Default target: powerpc64le-unknown-linux-gnu Host CPU: pwr8
What were your triple/cpu settings?
Extended Description
The Piglit (OpenGL test suite)
ext_transform_feedback-max-varyings
test utilizes somewhat unusual shader programs (both vertex and fragment shaders).The llc compiler prior to 4.0 compiled these programs in not-unacceptable times of 0.078 seconds for a representative vertex shader and 2.6-4.5 seconds for a representative fragment shader.
The V4.0 and later llc takes a MUCH longer time to compile the same code: 1.66 seconds for the vertex shader (a factor of 20 times slower!) and 1 minute 55 seconds for the fragment shader (a factor of 25-45 times slower!).
I will attach sample vertex shader code (
ir_draw_llvm_vs_variant0.bc
) and fragment shader code (ir_fs914_variant0.bc
).The target architecture is PPC64LE.