llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.9k stars 11.51k forks source link

>20% code size regression on tramp3d-v4 with new PM #48388

Open ornata opened 3 years ago

ornata commented 3 years ago
Bugzilla Link 49044
Version unspecified
OS All
CC @aeubanks,@RKSimon

Extended Description

1) Using LNT, compile CTMark at -O3 for AArch64 with + without the new PM. (I used -C target-arm64-iphoneos-internal.cmake + --cflags -O3 --cxxflags -O3) 2) Use llvm-test-suite/utils/compare.py to compare size.__text

test-suite...ark/tramp3d-v4/tramp3d-v4.test    613236       763796      24.6%

Although this is -O3, and size generally isn't a concern there, this is a pretty big jump in code size. It might be worth looking into if it is not expected. I think that a lot of this is due to inlining changes.

The following functions grew a lot in size:

__ZN5Pooma11newRelationIN3EOS5pg_igE5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10MultiPatchI7GridTag6RemoteI5BrickEEESG_SG_S3_IS9_d16ConstantFunctionEEEvRKT_RKT0_RKT1_RKT2_RKT3_ 1080 10036 8956
__Z6assignI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd14MultiPatchViewI7GridTag6RemoteI5BrickELi3EES5_d13ExpressionTagI10BinaryNodeI8OpDivideSD_I10OpSubtractSD_ISE_5FieldIS5_dSB_ESH_ESH_ESG_IS5_d16ConstantFunctionEEE8OpAssignERKSG_IT_T0_T1_ESU_RKSG_IT2_T3_T4_ERKT5_ 632 10436 9804
__Z6assignI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd14MultiPatchViewI7GridTag6RemoteI5BrickELi3EES5_d13ExpressionTagI10BinaryNodeI10OpMultiply6ScalarIdESD_I5OpAdd9ReferenceI5FieldIS5_dSB_EESL_EEE8OpAssignERKSJ_IT_T0_T1_ESV_RKSJ_IT2_T3_T4_ERKT5_ 496 15576 15080
__Z6assignI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd14MultiPatchViewI7GridTag6RemoteI5BrickELi3EES5_d13ExpressionTagI10BinaryNodeI8OpDivide5FieldIS5_dSB_ESG_EE8OpAssignERKSF_IT_T0_T1_ESP_RKSF_IT2_T3_T4_ERKT5_ 488 15608 15120
__Z6assignI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd14MultiPatchViewI7GridTag6RemoteI5BrickELi3EES5_d13ExpressionTagI9UnaryNodeI6FnSqrt10BinaryNodeI8OpDivideSF_I10OpMultiply6ScalarIdE5FieldIS5_dSB_EESL_EEE8OpAssignERKSK_IT_T0_T1_ESW_RKSK_IT2_T3_T4_ERKT5_ 496 15656 15160

(Format: function_name old_pm_size new_pm_size diff)

Then looking at the remarks:

$ ~/llvm-project/llvm/tools/opt-viewer/opt-stats.py diff0.opt.yaml 
Top 10 remarks by pass:
  asm-printer                    53%
  gvn                            18%
  licm                           12%
  inline                         12%
  slp-vectorizer                  2%
  loop-vectorize                  2%
  prologepilog                    1%
  regalloc                        1%
  loop-delete                     0%
  early-ifcvt                     0%

Top 10 remarks:
  asm-printer/InstructionMix     52%
  gvn/LoadClobbered              18%
  inline/Inlined                  7%
  licm/InstSunk                   6%
  licm/Hoisted                    5%
  inline/TooCostly                4%
  licm/LoadWithLoopInvariantAddressInvalidated  1%
  slp-vectorizer/NotBeneficial    1%
  asm-printer/InstructionCount    1%
  prologepilog/StackSize          1%
ornata commented 3 years ago

llvm-extracted functions which produce a code size increase I extracted one of the functions with a significant increase using llvm-extract --recursive.

This seems to give a similar code size increase:

$ build/bin/clang -O3 -arch arm64 -flegacy-pass-manager -c /tmp/func.bc -o /tmp/old-pm
$ build/bin/clang -O3 -arch arm64 -fno-legacy-pass-manager -c /tmp/func.bc -o /tmp/new-pm
$ ~/llvm-project/build/bin/llvm-size /tmp/old-pm /tmp/new-pm

__TEXT  __DATA  __OBJC  others  dec hex
43884   0   0   248527  292411  4763b   /tmp/old-pm
54632   0   0   321642  376274  5bdd2   /tmp/new-pm

And if you pass -Rpass-missed=inline -Rpass=inline to clang, it shows significant inlining differences.

(For reference, the function is _ZN5Pooma11newRelationIN3EOS5pg_igE5FieldI22UniformRectilinearMeshI10MeshTraitsILi3Ed21UniformRectilinearTag12CartesianTagLi3EEEd10MultiPatchI7GridTag6RemoteI5BrickEEESG_SG_S3_IS9_d16ConstantFunctionEEEvRKT_RKT0_RKT1_RKT2_RKT3_)

ornata commented 3 years ago

Compiling with -fno-inline removes the code size issue:

 test-suite...ark/tramp3d-v4/tramp3d-v4.test    592532       592548       0.0%

(It's only 16 B bigger)

Haven't gotten around to measuring perf yet, but yeah, it could be a reasonable tradeoff for -O3.

aeubanks commented 3 years ago

And also measuring performance would be interesting to see if the extra inlining helped or not.

aeubanks commented 3 years ago

Yeah it's likely due to inlining changes, I've also noticed Chrome's code size go up for files compiled with -O2/-O3. Could you try running with -fno-inline-functions? Also, running with -Os should be fairly similar in terms of code size.