llvm / llvm-project

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
http://llvm.org
Other
27.84k stars 11.47k forks source link

Reporting of predicted benefits of vectorisation #31039

Open dc81c6b5-3a5b-438e-b826-9e7edb3cf487 opened 7 years ago

dc81c6b5-3a5b-438e-b826-9e7edb3cf487 commented 7 years ago
Bugzilla Link 31691
Version 3.9
OS Linux
CC @lesshaste,@hfinkel,@joker-eph,@RKSimon

Extended Description

When running the Intel Compiler with -qopt-report=4, say, you get told the expected performance gain from vectorisation, amongst other useful information. For example,

LOOP BEGIN at permanent-in-c.c(47,7) remark #​25045: Fused Loops: ( 47 50 )

  remark #​15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
  remark #​15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
  remark #​15389: vectorization support: reference M has unaligned access   [ permanent-in-c.c(48,2) ]
  remark #​15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(51,8) ]
  remark #​15381: vectorization support: unaligned access used inside loop body
  remark #​15305: vectorization support: vector length 4
  remark #​15399: vectorization support: unroll factor set to 2
  remark #​15309: vectorization support: normalized vectorization overhead 0.600
  remark #​15301: FUSED LOOP WAS VECTORIZED
  remark #​15442: entire loop may be executed in remainder
  remark #​15448: unmasked aligned unit stride loads: 1 
  remark #​15449: unmasked aligned unit stride stores: 1 
  remark #​15450: unmasked unaligned unit stride loads: 1 
  remark #​15475: --- begin vector loop cost summary ---
  remark #​15476: scalar loop cost: 49 
  remark #​15477: vector loop cost: 10.000 
  remark #​15478: estimated potential speedup: 4.580 
  remark #​15487: type converts: 3 
  remark #​15488: --- end vector loop cost summary ---
  remark #​25456: Number of Array Refs Scalar Replaced In Loop: 2

LOOP END

gcc also gives you similar if perhaps less useful information.

[...] vect_model_reduction_cost: inside_cost = 6, prologue_cost = 2, epilogue_cost = 6 . test2.c:50:9: note: ==> examining statement: j_73 = j_186 + 1; test2.c:50:9: note: irrelevant. test2.c:50:9: note: ==> examining statement: if (j_73 < n.2_201) test2.c:50:9: note: irrelevant. test2.c:50:9: note: === vect_update_slp_costs_according_to_vf === test2.c:50:9: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown . test2.c:50:9: note: Cost model analysis: Vector inside of loop cost: 50 Vector prologue cost: 8 Vector epilogue cost: 52 Scalar iteration cost: 20 Scalar outside cost: 4 Vector outside cost: 60 prologue iterations: 0 epilogue iterations: 2 Calculated minimum iters for profitability: 5 test2.c:50:9: note: Runtime profitability threshold = 4 test2.c:50:9: note: Static estimate profitability threshold = 4 test2.c:50:9: note: epilog loop required

[...]

It would be great if clang/llvm could provide similar information to the user/coder.

llvmbot commented 7 years ago

Great suggestion! We already report something like this for the inliner but not for the vectorizer.

I think I mentioned this on IRC but we have some other facilities around opt remarks that you might find useful (and provide feedback on ;). See my dev meeting talk (http://llvm.org/devmtg/2016-11/#talk15).

hfinkel commented 7 years ago

Great idea. We should add the predicted-speedups to our vectorizer's optimization remarks.