llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
28.47k stars 11.77k forks source link

Scalar replacement of aggregates slower in LLVM 3.0+ relative to 2.8 #16016

Open llvmbot opened 11 years ago

llvmbot commented 11 years ago
Bugzilla Link 15644
Version 3.2
OS Linux
Reporter LLVM Bugzilla Contributor
CC @fhahn,@yuanfang-chen

Extended Description

This was originally sent to the LLVM mailing list after the LLVM 3.0 release, but it still seems to be an issue in 3.2. The attached file optimizes much faster (about 3x better) in LLVM 2.8 compared to 3.2. Performance didn't change on this test case between 3.0 and 3.2.

time opt -scalarrepl slow_sroa.ll

On my system: 2.8: 0.16s 3.2: 0.5s

This is from email correspondence on llvm-dev:


Actually, -scalarrepl-ssa is the slower one.

On 04/05/2012 06:56 PM, Andrew Clinton wrote:

Attached is the test case.

Run:

opt -scalarrepl slow_sroa.ll

Andrew

On 04/05/2012 05:11 PM, Nick Lewycky wrote:

I've patched SROA in a way that may have made it slower. Do you have a testcase we can look at?

Nick

On 4 April 2012 16:19, Andrew Clinton andrew@sidefx.com wrote:

I just upgraded our optimizer to LLVM 3.0 from 2.8 and noticed that the
scalar replacement of aggregates pass takes a lot longer for some code.
Has there been a performance regression in this pass, or does it do more
work?

LLVM 3.0:

  Total Execution Time: 1.0600 seconds (1.0526 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
   0.5100 ( 49.5%)   0.0000 (  0.0%)   0.5100 ( 48.1%)   0.5099 (
48.4%)  Scalar Replacement of Aggregates (SSAUp)
   0.1900 ( 18.4%)   0.0300 (100.0%)   0.2200 ( 20.8%)   0.2156 (
20.5%)  Scalar Replacement of Aggregates (DT)
   0.1200 ( 11.7%)   0.0000 (  0.0%)   0.1200 ( 11.3%)   0.1158 (
11.0%)  VEX Constant Propagation
   0.0200 (  1.9%)   0.0000 (  0.0%)   0.0200 (  1.9%)   0.0196 (
1.9%)  Simplify the CFG
   0.0200 (  1.9%)   0.0000 (  0.0%)   0.0200 (  1.9%)   0.0181 (
1.7%)  Module Verifier
...

LLVM 2.8:

  Total Execution Time: 0.6500 seconds (0.6489 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall
Time---  --- Name ---
   0.1400 ( 21.9%)   0.0000 (  0.0%)   0.1400 ( 21.5%)   0.1379 (
21.3%)  Scalar Replacement of Aggregates
   0.1200 ( 18.7%)   0.0000 (  0.0%)   0.1200 ( 18.5%)   0.1208 (
18.6%)  VEX Constant Propagation
   0.1000 ( 15.6%)   0.0000 (  0.0%)   0.1000 ( 15.4%)   0.1050 (
16.2%)  Scalar Replacement of Aggregates
   0.0400 (  6.3%)   0.0000 (  0.0%)   0.0400 (  6.2%)   0.0481 (
7.4%)  Combine redundant instructions
   0.0200 (  3.1%)   0.0000 (  0.0%)   0.0200 (  3.1%)   0.0235 (
3.6%)  Preliminary module verification
...
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu         http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
fhahn commented 5 years ago

Bitcode version of sroa_slow.ll, readable by current master

llvmbot commented 11 years ago

I've noticed (belatedly) that the -scalarrepl pass is the legacy scalar replacement of aggregates pass, and that it has been superceded by -sroa in LLVM 3.2. Executing opt with -sroa is fast - though I now need to run a separate pass of -mem2reg, which is still slow (0.5s).

So perhaps this bug could be better categorized as a performance regression in the mem2reg pass, not SROA.

Test with:

time opt -mem2reg slow_sroa.ll

llvmbot commented 11 years ago

Test case for SROA performance regression

llvmbot commented 11 years ago

The attachement "slow_sroa.ll" is 2.7Mb and was not able to attach. I can email it to any interested developers.

Can you try compressing it with xz?

llvmbot commented 11 years ago

The attachement "slow_sroa.ll" is 2.7Mb and was not able to attach. I can email it to any interested developers.