Open llvmbot opened 11 years ago
Same ranking in a non-debug build:
===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 0.1241 seconds (0.1230 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.0489 ( 41.1%) 0.0000 ( 0.4%) 0.0489 ( 39.4%) 0.0469 ( 38.1%) SROA 0.0154 ( 12.9%) 0.0001 ( 1.4%) 0.0155 ( 12.5%) 0.0153 ( 12.4%) Function Integration/Inlining 0.0088 ( 7.4%) 0.0000 ( 0.8%) 0.0088 ( 7.1%) 0.0086 ( 7.0%) Combine redundant instructions 0.0086 ( 7.2%) 0.0000 ( 0.2%) 0.0086 ( 6.9%) 0.0078 ( 6.3%) Module Verifier 0.0041 ( 3.5%) 0.0000 ( 0.1%) 0.0041 ( 3.3%) 0.0035 ( 2.9%) Dead Store Elimination 0.0023 ( 2.0%) 0.0000 ( 0.0%) 0.0023 ( 1.9%) 0.0024 ( 1.9%) Interprocedural Sparse Conditional Constant Propagation 0.0008 ( 0.6%) 0.0000 ( 0.1%) 0.0008 ( 0.6%) 0.0022 ( 1.8%) Deduce function attributes 0.0021 ( 1.8%) 0.0001 ( 1.2%) 0.0022 ( 1.7%) 0.0021 ( 1.7%) Tail Call Elimination 0.0018 ( 1.6%) 0.0000 ( 0.0%) 0.0018 ( 1.5%) 0.0016 ( 1.3%) Remove unused exception handling info 0.0014 ( 1.2%) 0.0000 ( 0.1%) 0.0014 ( 1.1%) 0.0013 ( 1.0%) Value Propagation 0.0007 ( 0.6%) 0.0000 ( 0.2%) 0.0007 ( 0.6%) 0.0012 ( 1.0%) Simplify the CFG 0.0012 ( 1.0%) 0.0000 ( 0.0%) 0.0012 ( 1.0%) 0.0011 ( 0.9%) Value Propagation
Not sure it's worth worrying too much, but I'll try to investigate a little bit more.
I took a look at this bug because I found it amusing. First of all, the opaque pointer work made the IR not valid anymore. I converted using a script found on llvm-commits:
$ cat load.py import fileinput import sys import re
pat = re.compile(r"((?:=|:|^)\sload (?:atomic )?(?:volatile )?(.?))(| addrspace(\d+) )*($| (?:%|@|null|undef|blockaddress|getelementptr|addrspacecast|bitcast|inttoptr|[[[a-zA-Z]|{{).*$)")
for line in sys.stdin: sys.stdout.write(re.sub(pat, r"\1, \2\3*\4", line))
With that in place, I ran again through opt and this is what I got:
===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 1.0269 seconds (1.0272 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 0.5289 ( 52.3%) 0.0012 ( 7.8%) 0.5300 ( 51.6%) 0.5300 ( 51.6%) SROA 0.1025 ( 10.1%) 0.0004 ( 3.0%) 0.1029 ( 10.0%) 0.1032 ( 10.0%) Function Integration/Inlining 0.0838 ( 8.3%) 0.0002 ( 1.2%) 0.0840 ( 8.2%) 0.0840 ( 8.2%) Module Verifier 0.0475 ( 4.7%) 0.0001 ( 0.9%) 0.0476 ( 4.6%) 0.0476 ( 4.6%) Combine redundant instructions 0.0227 ( 2.2%) 0.0001 ( 0.6%) 0.0227 ( 2.2%) 0.0227 ( 2.2%) Interprocedural Sparse Conditional Constant Propagation 0.0153 ( 1.5%) 0.0001 ( 0.9%) 0.0155 ( 1.5%) 0.0155 ( 1.5%) Reassociate expressions 0.0150 ( 1.5%) 0.0001 ( 0.8%) 0.0151 ( 1.5%) 0.0151 ( 1.5%) Dead Store Elimination 0.0133 ( 1.3%) 0.0002 ( 1.6%) 0.0135 ( 1.3%) 0.0135 ( 1.3%) Simplify the CFG 0.0087 ( 0.9%) 0.0001 ( 0.7%) 0.0088 ( 0.9%) 0.0088 ( 0.9%) Tail Call Elimination 0.0082 ( 0.8%) 0.0001 ( 0.8%) 0.0084 ( 0.8%) 0.0084 ( 0.8%) Deduce function attributes 0.0076 ( 0.8%) 0.0001 ( 0.9%) 0.0078 ( 0.8%) 0.0077 ( 0.8%) Sparse Conditional Constant Propagation 0.0058 ( 0.6%) 0.0001 ( 1.0%) 0.0059 ( 0.6%) 0.0059 ( 0.6%) Value Propagation 0.0056 ( 0.6%) 0.0001 ( 0.6%) 0.0057 ( 0.6%) 0.0057 ( 0.6%) Value Propagation 0.0052 ( 0.5%) 0.0003 ( 1.9%) 0.0054 ( 0.5%) 0.0054 ( 0.5%) Early CSE
FunctionAttrs doesn't even show up in the top 5 (< 1%), but 52% of the time is spent in SROA. I think this PR deserves a new title.
Extended Description
This is broken out from bug 13263. On the attached .ll file, we see this:
===-------------------------------------------------------------------------=== ... Pass execution timing report ... ===-------------------------------------------------------------------------=== Total Execution Time: 4.9600 seconds (4.9723 wall clock)
---User Time--- --System Time-- --User+System-- ---Wall Time--- --- Name --- 3.8400 ( 78.0%) 0.0100 ( 25.0%) 3.8500 ( 77.6%) 3.8668 ( 77.8%) Deduce function attributes 0.6800 ( 13.8%) 0.0000 ( 0.0%) 0.6800 ( 13.7%) 0.6421 ( 12.9%) SROA 0.1800 ( 3.7%) 0.0300 ( 75.0%) 0.2100 ( 4.2%) 0.1741 ( 3.5%) Function Integration/Inlining 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0000 ( 0.0%) 0.0570 ( 1.1%) Module Verifier [...]
The slowdown is happening inside AddNoCaptureAttrs, and the capture tracking analysis happens to be trivial so that isn't the problem (and SCCNodes.size() == 1 so that also isn't the problem). I'm placing my bet on inefficient AttributeSet manipulation.