Block Frequencies for loop seem incorrect


Bugzilla Link	27791
Version	trunk
OS	All
Attachments	source file
CC	@vns-mn,@hfinkel,@rotateright

Extended Description

$ clang++ -O2 -fprofile-instr-generate ProfileTester.cpp -o ProfileTester -fno-vectorize -fno-unroll-loops $ LLVM_PROFILE_FILE="profiletester-%p.profraw" ./ProfileTester Number of entries into the if statement: 11 The new x value is: 2.07692 The new c value is: 2.69231 $ llvm-profdata merge -output=profiletester.profdata profiletester-*.profraw $ clang++ -O2 -fprofile-instr-use=profiletester.profdata ProfileTester.cpp -S -emit-llvm -o ProfileTester-with-pd.ll -fno-vectorize -fno-unroll-loops $ opt -analyze -block-freq < ProfileTester-with-pd.ll Printing analysis 'Block Frequency Analysis' for function 'main': block-frequency-info: main

entry: float = 1.0, int = 8388624
for.cond.cleanup: float = 1.0, int = 8388624
if.then.i145: float = 0.00000095367, int = 8
_ZSt13__check_facetISt5ctypeIcEERKTPS3.exit: float = 1.0, int = 8388616
if.then.i: float = 0.8, int = 6710892
if.end.i: float = 0.2, int = 1677723
_ZNKSt5ctypeIcE5widenEc.exit: float = 1.0, int = 8388616
if.then.i148: float = 0.00000095367, int = 8
_ZSt13__check_facetISt5ctypeIcEERKTPS3.exit150: float = 1.0, int = 8388607
if.then.i125: float = 0.8, int = 6710886
if.end.i129: float = 0.2, int = 1677721
_ZNKSt5ctypeIcE5widenEc.exit131: float = 1.0, int = 8388607
if.then.i152: float = 0.00000095367, int = 8
_ZSt13__check_facetISt5ctypeIcEERKTPS3.exit154: float = 1.0, int = 8388599
if.then.i137: float = 0.8, int = 6710879
if.end.i141: float = 0.2, int = 1677720
_ZNKSt5ctypeIcE5widenEc.exit143: float = 1.0, int = 8388599
for.body: float = 5001.5, int = 41955695043
for.body7: float = 55012.0, int = 461475390211
for.cond.cleanup25: float = 55012.0, int = 461475390211
for.body26: float = 275030036.0, int = 2307123560708599
if.end.loopexit: float = 6.0006, int = 50336765
if.end: float = 5001.5, int = 41955695043

Printing analysis 'Block Frequency Analysis' for function '_GLOBAL__sub_I_ProfileTester.cpp': block-frequency-info: _GLOBAL__sub_I_ProfileTester.cpp

entry: float = 1.0, int = 8

Please note:

for.body: float = 5001.5, int = 41955695043

This is a top-level loop with a static trip count of 10000. Thus, its loop header should execute 10000 times as frequently as the function entry block. The reported frequency is ~5000 times. This seems off by a factor of 2.

FE based PGO uses the Laplace rule of succession to 'normalize' the branch weight (see following code).

What happens is that for the loop backedge in the test case, it has the effect of reduce the loop trip count by half when the 'normalized' weight is fed to the BFI.

IR based PGO does not have this rule.

Try

-fprofile-instr-generate -Xclang -fprofile-instrument=llvm to turn on IR based instrumentation.

(the profile use command line is the same as FE based, ie., no additional option is needed to turn it on).

/// According to Laplace's Rule of Succession, it is better to compute the /// weight based on the count plus 1, so universally add 1 to the value. /// /// \pre \c Scale was calculated by \a calculateWeightScale() with a weight no /// greater than \c Weight. static uint32_t scaleBranchWeight(uint64_t Weight, uint64_t Scale) { assert(Scale && "scale by 0?"); uint64_t Scaled = Weight / Scale + 1; assert(Scaled <= UINT32_MAX && "overflow 32-bits"); return Scaled; }

llvm / llvm-project

Block Frequencies for loop seem incorrect #28165

Extended Description