Reporting of predicted benefits of vectorisation

dc81c6b5-3a5b-438e-b826-9e7edb3cf487 commented 7 years ago


Bugzilla Link	31691
Version	3.9
OS	Linux
CC	@lesshaste,@hfinkel,@joker-eph,@RKSimon

Extended Description

When running the Intel Compiler with -qopt-report=4, say, you get told the expected performance gain from vectorisation, amongst other useful information. For example,

LOOP BEGIN at permanent-in-c.c(47,7) remark #25045: Fused Loops: ( 47 50 )

  remark #&#8203;15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
  remark #&#8203;15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(48,2) ]
  remark #&#8203;15389: vectorization support: reference M has unaligned access   [ permanent-in-c.c(48,2) ]
  remark #&#8203;15388: vectorization support: reference v has aligned access   [ permanent-in-c.c(51,8) ]
  remark #&#8203;15381: vectorization support: unaligned access used inside loop body
  remark #&#8203;15305: vectorization support: vector length 4
  remark #&#8203;15399: vectorization support: unroll factor set to 2
  remark #&#8203;15309: vectorization support: normalized vectorization overhead 0.600
  remark #&#8203;15301: FUSED LOOP WAS VECTORIZED
  remark #&#8203;15442: entire loop may be executed in remainder
  remark #&#8203;15448: unmasked aligned unit stride loads: 1 
  remark #&#8203;15449: unmasked aligned unit stride stores: 1 
  remark #&#8203;15450: unmasked unaligned unit stride loads: 1 
  remark #&#8203;15475: --- begin vector loop cost summary ---
  remark #&#8203;15476: scalar loop cost: 49 
  remark #&#8203;15477: vector loop cost: 10.000 
  remark #&#8203;15478: estimated potential speedup: 4.580 
  remark #&#8203;15487: type converts: 3 
  remark #&#8203;15488: --- end vector loop cost summary ---
  remark #&#8203;25456: Number of Array Refs Scalar Replaced In Loop: 2

LOOP END

gcc also gives you similar if perhaps less useful information.

[...] vect_model_reduction_cost: inside_cost = 6, prologue_cost = 2, epilogue_cost = 6 . test2.c:50:9: note: ==> examining statement: j_73 = j_186 + 1; test2.c:50:9: note: irrelevant. test2.c:50:9: note: ==> examining statement: if (j_73 < n.2_201) test2.c:50:9: note: irrelevant. test2.c:50:9: note: === vect_update_slp_costs_according_to_vf === test2.c:50:9: note: cost model: epilogue peel iters set to vf/2 because loop iterations are unknown . test2.c:50:9: note: Cost model analysis: Vector inside of loop cost: 50 Vector prologue cost: 8 Vector epilogue cost: 52 Scalar iteration cost: 20 Scalar outside cost: 4 Vector outside cost: 60 prologue iterations: 0 epilogue iterations: 2 Calculated minimum iters for profitability: 5 test2.c:50:9: note: Runtime profitability threshold = 4 test2.c:50:9: note: Static estimate profitability threshold = 4 test2.c:50:9: note: epilog loop required

[...]

It would be great if clang/llvm could provide similar information to the user/coder.

llvmbot commented 7 years ago

Great suggestion! We already report something like this for the inliner but not for the vectorizer.

I think I mentioned this on IRC but we have some other facilities around opt remarks that you might find useful (and provide feedback on ;). See my dev meeting talk (http://llvm.org/devmtg/2016-11/#talk15).

hfinkel commented 7 years ago

Great idea. We should add the predicted-speedups to our vectorizer's optimization remarks.

llvm / llvm-project

Reporting of predicted benefits of vectorisation #31039

Extended Description