Closed lattner closed 14 years ago
The -raise pass was remvoed with these patches:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043817.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043818.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043819.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043820.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043821.html
Resolved.
ok, go ahead and nuke it.
the 255.vortex differences aren't worth worrying about, we're good there.
For 400.perlbench:
Unfortunately, my machine doesn't have enough memory to do a diff of two 2.7GB LLVM assembly files.
447.dealII differences Again, there's a lot of differences in this one. Many of them seem to be the relocation of gep instructions.
456.hmmer differences There are numerous differences on this test case. More analysis will be needed.
433.milc differences For 433.milc the only difference is that -raise moved a GEP instruction down one instruction in its basic block. Not sure how that can produce a 1.9 second difference in execution time.
255.vortex difference on stripped files This file contains the llvm assembly differences for 255.vortex.llvm.bc after opt -strip has been run on them.
For 255.vortex, here's what I found:
The bytecode size difference is due to the -raise pass removing or altering the names of values. A diff of the llvm assembly corresponding to the .llvm.bc files from both cases produced 6000 lines of diffs. But, the only way they differed was in the names of values. I ran opt -strip on these files and re-ran the diff. It was vastly shorter. I'll attach it.
The diff produces in #1 showed that -raise will fold a GEP instruction into a store instruction using a GEP constant expression. Without -raise this doesn't happen. There is also some changes in the # of uses probably related to the GEP folding into the store.
Its not clear why -instcombine wouldn't do the same transform as #2.
We should generate comparisons on Darwin/PPC as well just to make sure that -raise doesn't assist any backend opts on that platform. I don't see why it would, but we should check.
Looking at the External test comparison, it doesn't look like -raise does much.
However a few things could be investigated:
255.vortex - Why does raise eliminate 733 bytes of bytecode? 433.milc - What does raise do to make llc 1.9 seconds faster? 456.hmmer - What does raise do to make llc 1.54 seconds fasters? 447.dealII - Why is "gccas" (opt) 2.93 seconds faster with -raise? 400.perlbench - What does raise do to make jit 1.04 seconds faster, especially with 1133 more bytecode?
External Performance Comparison (Spreadsheet) A comparison of the SPEC tests with and without -raise.
[reid@bashful llvm-test-1]$ for f in MultiSource/Benchmarks/Olden//Output/.llvm.bc ; do
cmp $f ../llvm-test-4/$f done [reid@bashful llvm-test-1]$ echo $? 0
Looks pretty identical to me.
there is 0 difference in bytecode sizes. Perhaps -raise doesn't do anything at all on Olden?
Cool. You could diff them to verify.
This is a good sign -raise isn't doing anything any more. :) SPEC will give more certainty though.
Olden Performance Comparison This looks much more reasonable. The difference in the runs is in the small "noise" range resulting from me using the computer while the test is ongoing. These were done single threaded and I'll do the same as for the Externals.
Interesting note: there is 0 difference in bytecode sizes. Perhaps -raise doesn't do anything at all on Olden?
sounds great, thanks reid!
Yes. I now have two environments that were updated at the same moment. The only difference is that one has -raise and one doesn't. This should be good for testing. I'm going to do Olden with GET_STABLE_NUMBERS=1 because its relatively quick and a good test for stability. If that goes okay, I'll get to External tests this weekend (7+ hours each).
Makes sense. A lot has changed in a few days. I suspect the 'noise' is really measuring these changes :)
-Chris
- What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once understood, we can reimplement this in some other pass.
I ran 401.bzip2 using "head" from today and compared it against the "without -raise" run I did last night. The bytecode sizes are identical. Consequently, I don't think I was comparing apples with apples. I used the results of my nightly run from a couple days ago for the "with -raise" data but my "without -raise" also included the shift patch, which may have affected optimization (positively).
So, guess I get to do this again.
Devang, did you run this comparison on Darwin?
- I have little confidence in the stabilities of these numbers, they seem very noisy.
I eliminated as much interference workload as I could by shutting down email, xchat, etc. and just let it run. I think the issue here is the difference in the builds. I'll run the comparison again and report new numbers. Unfortunately, it'll have to wait until this evening for "quiet machine".
Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns reproduce? If so, we should investigate what is happening there too.
I'll defer this until after I get new numbers. If they still look problematic, I'll investigate then.
Yes, it looks like we are close to it. Things worth investigating:
What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once understood, we can reimplement this in some other pass.
I have little confidence in the stabilities of these numbers, they seem very noisy. Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns reproduce? If so, we should investigate what is happening there too.
-Chris
Chris,
Please let me know if this is sufficient evidence to support removal of the -raise pass.
Reid
Performance Data Spreadsheet This updates some formatting and gets rid of columns that weren't run in both tests and rows that indicate test failures.
Summary Of Performance Results (See the Spreadsheet)
Four SPEC test suites were compared: CINT2000, CFP2000, CINT2006, and CFP2006.
In general it looks positive to remove the -raise pass. The overall results (sum of all tests) are as follows:
+--Compile-+ +----------Run Times--------+
ITEM GCCAS Bytecode LLC JIT GCC CBE LLC JIT Delta 151.05 818298 -9.07 16.2 14.66 20.81 46.40 57.15 Percent 6.49% 1.95% -1.09% 3.09% 2.18% 3.14% 7.12% 4.04%
The "Delta" values shown are the difference in aggregate time between "with -raise" and "without -raise" runs across all test cases. Negative values mean the "without -raise" value was larger. Positive values mean the "without -raise" value was smaller. Because these values represent the subtraction of the "without -raise" from the "with -raise" data sets, positive values favor removal of -raise while negative values favor keeping -raise.
The "Percent" values are computed by dividing the "Delta" value by the "with -raise" sum. This gives an approximation of the magnitude of change introduced by removing the -raise pass.
To interpret this correctly, one would say: "On average, the -raise pass increases bytecode volumes by nearly 2% and LLC runtimes by over 7%".
The only negative impact of removing -raise seems to be a 1.09% increase in the LLC compilation time. This is rather surprising given that there is less work done by llc without -raise.
Note that these results account for only one run of the nightly tests. Additional samples would be welcome.
Performance Data Spreadsheet This attachment contains a spreadsheet in Excel format that contains the raw data and a comparison of the External tests with and without the -raise pass.
I want this gone. :)
I will set up overnight performance test run.
or even just run SPEC2K + SPEC2K6.
-Chris
Nothing prevents it. We just need someone to do an llvm-test run with an without it to see if there are any perf differences.
-Chris
Is there anything preventing this from being eliminated now?
Can we use the stats to determine how much it fires (if at all)?
What pieces does it implement that might still be useful to retain?
When this happens, we can eliminate Type::canLosslesslyBitCastTo and probably some other stuff only used by -raise.
-Chris
Extended Description
The -raise pass was mainly around to eliminate casts from int -> uint etc through some tricky transformations. With the recent signless work, this pass should be mostly useless. Eliminating it is good as it reduces compile time (one fewer pass), eliminates a bunch of ugly code (the raise pass is very nasty), and potentially eliminating the LLVMTransforms library (ExprTypeConvert.cpp, LevelRaise.cpp, and TransformInternals.cpp).
We should do performance analysis (e.g. spec2k and 2k6) to see whether this is completely dead, and if not, reimplement the useful pieces in other passes.
-Chris