llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.57k stars 11.81k forks source link

Investigate elimination of the -raise pass #1444

Closed lattner closed 14 years ago

lattner commented 17 years ago
Bugzilla Link 1072
Resolution FIXED
Resolved on Feb 22, 2010 12:47
Version trunk
OS MacOS X

Extended Description

The -raise pass was mainly around to eliminate casts from int -> uint etc through some tricky transformations. With the recent signless work, this pass should be mostly useless. Eliminating it is good as it reduces compile time (one fewer pass), eliminates a bunch of ugly code (the raise pass is very nasty), and potentially eliminating the LLVMTransforms library (ExprTypeConvert.cpp, LevelRaise.cpp, and TransformInternals.cpp).

We should do performance analysis (e.g. spec2k and 2k6) to see whether this is completely dead, and if not, reimplement the useful pieces in other passes.

-Chris

llvmbot commented 17 years ago

The -raise pass was remvoed with these patches:

http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043817.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043818.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043819.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043820.html http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20070129/043821.html

Resolved.

lattner commented 17 years ago

ok, go ahead and nuke it.

lattner commented 17 years ago

the 255.vortex differences aren't worth worrying about, we're good there.

llvmbot commented 17 years ago

For 400.perlbench:

Unfortunately, my machine doesn't have enough memory to do a diff of two 2.7GB LLVM assembly files.

llvmbot commented 17 years ago

447.dealII differences Again, there's a lot of differences in this one. Many of them seem to be the relocation of gep instructions.

llvmbot commented 17 years ago

456.hmmer differences There are numerous differences on this test case. More analysis will be needed.

llvmbot commented 17 years ago

433.milc differences For 433.milc the only difference is that -raise moved a GEP instruction down one instruction in its basic block. Not sure how that can produce a 1.9 second difference in execution time.

llvmbot commented 17 years ago

255.vortex difference on stripped files This file contains the llvm assembly differences for 255.vortex.llvm.bc after opt -strip has been run on them.

llvmbot commented 17 years ago

For 255.vortex, here's what I found:

  1. The bytecode size difference is due to the -raise pass removing or altering the names of values. A diff of the llvm assembly corresponding to the .llvm.bc files from both cases produced 6000 lines of diffs. But, the only way they differed was in the names of values. I ran opt -strip on these files and re-ran the diff. It was vastly shorter. I'll attach it.

  2. The diff produces in #​1 showed that -raise will fold a GEP instruction into a store instruction using a GEP constant expression. Without -raise this doesn't happen. There is also some changes in the # of uses probably related to the GEP folding into the store.

Its not clear why -instcombine wouldn't do the same transform as #​2.

llvmbot commented 17 years ago

We should generate comparisons on Darwin/PPC as well just to make sure that -raise doesn't assist any backend opts on that platform. I don't see why it would, but we should check.

llvmbot commented 17 years ago

Looking at the External test comparison, it doesn't look like -raise does much.

However a few things could be investigated:

255.vortex - Why does raise eliminate 733 bytes of bytecode? 433.milc - What does raise do to make llc 1.9 seconds faster? 456.hmmer - What does raise do to make llc 1.54 seconds fasters? 447.dealII - Why is "gccas" (opt) 2.93 seconds faster with -raise? 400.perlbench - What does raise do to make jit 1.04 seconds faster, especially with 1133 more bytecode?

llvmbot commented 17 years ago

External Performance Comparison (Spreadsheet) A comparison of the SPEC tests with and without -raise.

llvmbot commented 17 years ago

[reid@bashful llvm-test-1]$ for f in MultiSource/Benchmarks/Olden//Output/.llvm.bc ; do

cmp $f ../llvm-test-4/$f done [reid@bashful llvm-test-1]$ echo $? 0

Looks pretty identical to me.

lattner commented 17 years ago

there is 0 difference in bytecode sizes. Perhaps -raise doesn't do anything at all on Olden?

Cool. You could diff them to verify.

This is a good sign -raise isn't doing anything any more. :) SPEC will give more certainty though.

llvmbot commented 17 years ago

Olden Performance Comparison This looks much more reasonable. The difference in the runs is in the small "noise" range resulting from me using the computer while the test is ongoing. These were done single threaded and I'll do the same as for the Externals.

Interesting note: there is 0 difference in bytecode sizes. Perhaps -raise doesn't do anything at all on Olden?

lattner commented 17 years ago

sounds great, thanks reid!

llvmbot commented 17 years ago

Yes. I now have two environments that were updated at the same moment. The only difference is that one has -raise and one doesn't. This should be good for testing. I'm going to do Olden with GET_STABLE_NUMBERS=1 because its relatively quick and a good test for stability. If that goes okay, I'll get to External tests this weekend (7+ hours each).

lattner commented 17 years ago

Makes sense. A lot has changed in a few days. I suspect the 'noise' is really measuring these changes :)

-Chris

llvmbot commented 17 years ago
  • What does -raise do on 401.bzip2 that shrinks the bc file 18K? Once understood, we can reimplement this in some other pass.

I ran 401.bzip2 using "head" from today and compared it against the "without -raise" run I did last night. The bytecode sizes are identical. Consequently, I don't think I was comparing apples with apples. I used the results of my nightly run from a couple days ago for the "with -raise" data but my "without -raise" also included the shift patch, which may have affected optimization (positively).

So, guess I get to do this again.

Devang, did you run this comparison on Darwin?

  • I have little confidence in the stabilities of these numbers, they seem very noisy.

I eliminated as much interference workload as I could by shutting down email, xchat, etc. and just let it run. I think the issue here is the difference in the builds. I'll run the comparison again and report new numbers. Unfortunately, it'll have to wait until this evening for "quiet machine".

Can you see if the 254.gap/LLC, 254.gap/CBE, 255.vortex/CBE, and 256.bzip2/cbe slowdowns reproduce? If so, we should investigate what is happening there too.

I'll defer this until after I get new numbers. If they still look problematic, I'll investigate then.

lattner commented 17 years ago

Yes, it looks like we are close to it. Things worth investigating:

-Chris

llvmbot commented 17 years ago

Chris,

Please let me know if this is sufficient evidence to support removal of the -raise pass.

Reid

llvmbot commented 17 years ago

Performance Data Spreadsheet This updates some formatting and gets rid of columns that weren't run in both tests and rows that indicate test failures.

llvmbot commented 17 years ago

Summary Of Performance Results (See the Spreadsheet)

Four SPEC test suites were compared: CINT2000, CFP2000, CINT2006, and CFP2006.

In general it looks positive to remove the -raise pass. The overall results (sum of all tests) are as follows:

                        +--Compile-+   +----------Run Times--------+

ITEM GCCAS Bytecode LLC JIT GCC CBE LLC JIT Delta 151.05 818298 -9.07 16.2 14.66 20.81 46.40 57.15 Percent 6.49% 1.95% -1.09% 3.09% 2.18% 3.14% 7.12% 4.04%

The "Delta" values shown are the difference in aggregate time between "with -raise" and "without -raise" runs across all test cases. Negative values mean the "without -raise" value was larger. Positive values mean the "without -raise" value was smaller. Because these values represent the subtraction of the "without -raise" from the "with -raise" data sets, positive values favor removal of -raise while negative values favor keeping -raise.

The "Percent" values are computed by dividing the "Delta" value by the "with -raise" sum. This gives an approximation of the magnitude of change introduced by removing the -raise pass.

To interpret this correctly, one would say: "On average, the -raise pass increases bytecode volumes by nearly 2% and LLC runtimes by over 7%".

The only negative impact of removing -raise seems to be a 1.09% increase in the LLC compilation time. This is rather surprising given that there is less work done by llc without -raise.

Note that these results account for only one run of the nightly tests. Additional samples would be welcome.

llvmbot commented 17 years ago

Performance Data Spreadsheet This attachment contains a spreadsheet in Excel format that contains the raw data and a comparison of the External tests with and without the -raise pass.

llvmbot commented 17 years ago

External Nightly Test Results Without -raise

llvmbot commented 17 years ago

External Nightly Test Results With -raise

llvmbot commented 17 years ago

I want this gone. :)

llvmbot commented 17 years ago

I will set up overnight performance test run.

lattner commented 17 years ago

or even just run SPEC2K + SPEC2K6.

-Chris

lattner commented 17 years ago

Nothing prevents it. We just need someone to do an llvm-test run with an without it to see if there are any perf differences.

-Chris

llvmbot commented 17 years ago

Is there anything preventing this from being eliminated now?

Can we use the stats to determine how much it fires (if at all)?

What pieces does it implement that might still be useful to retain?

lattner commented 17 years ago

When this happens, we can eliminate Type::canLosslesslyBitCastTo and probably some other stuff only used by -raise.

-Chris