Closed abique closed 7 years ago
Hi,
May I ask why running opt first prevents llc to trigger the bug? Which pass are executed by opt which are not by llc?
Thanks.
Thank you very much.
The commit that fixed the compile time regression is
commit 30a921f62a8444a478e456d99022ea847f48336c
Author: Nirav Dave <niravd@google.com>
Date: Tue Mar 14 00:34:14 2017 +0000
In visitSTORE, always use FindBetterChain, rather than only when UseAA is enabled.
Recommiting with compiler time improvements
Recommitting after fixup of 32-bit aliasing sign offset bug in DAGCombiner.
* Simplify Consecutive Merge Store Candidate Search
Now that address aliasing is much less conservative, push through
simplified store merging search and chain alias analysis which only
checks for parallel stores through the chain subgraph. This is cleaner
as the separation of non-interfering loads/stores from the
store-merging logic.
When merging stores search up the chain through a single load, and
finds all possible stores by looking down from through a load and a
TokenFactor to all stores visited.
This improves the quality of the output SelectionDAG and the output
Codegen (save perhaps for some ARM cases where we correctly constructs
wider loads, but then promotes them to float operations which appear
but requires more expensive constant generation).
Some minor peephole optimizations to deal with improved SubDAG shapes (listed below)
[...]
Hi,
Thank you very much for finding out what is the root cause of the issue.
We'll give a try to the latest svn revision tomorrow.
So, to clarify, I'm now able to reproduce on every machine I try. The trick is to not run opt before but just run llc on the bitcode as is.
So, this doesn't happen only on a zen host. This also doesn't happen only when optimizing for Ryzen. It's a general problem.
Examples:
[davide@localhost bin]$ time ./llc blah.ll -mtriple=x86_64-unknown -mcpu=core2
real 0m4.327s user 0m4.267s sys 0m0.060s [davide@localhost bin]$ time ./llc blah.ll -mtriple=x86_64-unknown -mcpu=znver1
real 0m7.321s user 0m7.239s sys 0m0.081s [davide@localhost bin]$ time ./llc blah.ll -mtriple=x86_64-unknown -mcpu=btver1
real 0m8.947s user 0m8.918s sys 0m0.029s
We'll take a look, but please take the time to elaborate adn be more precise when reporting bugs in the future.
It seems the time went down to 40 seconds to < 10 seconds from 4.0 to today. I recommend to try on ToT as workaround.
So, to clarify
Apparently Simon is able to reproduce this one
The problem seems to be in llc and not the JIT. 46% of self time in SUnit::addPred and 40% in SUint::ComputeHeight
We'll investigate.
This still has no info on how to reproduce. Please reopen when you have a standalone testcase. Thanks!
Please provide a standalone repro, otherwise it's impossible to reproduce.
The issue does not happen in llc or opt, but when we JIT the code using llvm::ExecutionEngine::getPointerToFunction().
I tried on several machines and I'm not able to reproduce. Also, your bug report doesn't seem to contain enough informations to reproduce the problem (opt is fast, llc is fast, hard to guess where the cycles are spent). Feel free to reopen when you have more informations. Cheers.
This doesn't reproduce on trunk for me (I tried on a Ryzen). I suspect a problem in your setup. Also, please try trunk before reporting issues.
On the problematic computer, even if we force the cpu target to "core2" it is still taking a lot of time.
Is it possible that the optimizer ignore the "core2" at some points and gets into ryzen optimizations?
I should add that on Linux, with Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz it takes 151ms.
Extended Description
Hi,
I'm a Bitwig Studio developer (www.bitwig.com), and we use LLVM to JIT some digital signal processing algorithms. Our software is used by thousand of customers on Windows, Mac and Linux.
We just updated from LLVM 3.9.1 to LLVM 4.0 and found that it takes a lot of time to JIT the attached LLVM IR (48701ms) on AMD Ryzen R7 1700X, while it is not noticeable on other architectures.
The command "opt -mcpu=znver1 /home/abique/downloads/claes-cache-entry.ll -o tutu.bc -O3" is instant, so it is slow in the target lowering or in the JIT phase.
By the way, do you have a workaround for this issue until the fix is released?
Many thanks.
Regards, Alexandre