Open AlexisPerry opened 8 years ago
FE based PGO uses the Laplace rule of succession to 'normalize' the branch weight (see following code).
What happens is that for the loop backedge in the test case, it has the effect of reduce the loop trip count by half when the 'normalized' weight is fed to the BFI.
IR based PGO does not have this rule.
Try
-fprofile-instr-generate -Xclang -fprofile-instrument=llvm to turn on IR based instrumentation.
(the profile use command line is the same as FE based, ie., no additional option is needed to turn it on).
/// According to Laplace's Rule of Succession, it is better to compute the /// weight based on the count plus 1, so universally add 1 to the value. /// /// \pre \c Scale was calculated by \a calculateWeightScale() with a weight no /// greater than \c Weight. static uint32_t scaleBranchWeight(uint64_t Weight, uint64_t Scale) { assert(Scale && "scale by 0?"); uint64_t Scaled = Weight / Scale + 1; assert(Scaled <= UINT32_MAX && "overflow 32-bits"); return Scaled; }
Extended Description
$ clang++ -O2 -fprofile-instr-generate ProfileTester.cpp -o ProfileTester -fno-vectorize -fno-unroll-loops $ LLVM_PROFILE_FILE="profiletester-%p.profraw" ./ProfileTester Number of entries into the if statement: 11 The new x value is: 2.07692 The new c value is: 2.69231 $ llvm-profdata merge -output=profiletester.profdata profiletester-*.profraw $ clang++ -O2 -fprofile-instr-use=profiletester.profdata ProfileTester.cpp -S -emit-llvm -o ProfileTester-with-pd.ll -fno-vectorize -fno-unroll-loops $ opt -analyze -block-freq < ProfileTester-with-pd.ll Printing analysis 'Block Frequency Analysis' for function 'main': block-frequency-info: main
Printing analysis 'Block Frequency Analysis' for function '_GLOBAL__sub_I_ProfileTester.cpp': block-frequency-info: _GLOBAL__sub_I_ProfileTester.cpp
Please note:
This is a top-level loop with a static trip count of 10000. Thus, its loop header should execute 10000 times as frequently as the function entry block. The reported frequency is ~5000 times. This seems off by a factor of 2.