Open llvmbot opened 7 years ago
I think it will be helpful to understand how you ended up with two summaries in an LTO build. My main concern in using freshness as the criteria is that you may end up with a profile as a result of which the performance you get up ends up worse than non-PGO case. It is definitely possible that TotalCount is lesser in a fresh profile.
I'm afraid I don't have total control or access to the environment where this was caused, so I can't give you an exact explanation, but more or less this is what happens in the build system for my internal customer:
1) There are two different builds, one for the libraries and the other for the game code.
2) The latter (game) is touched more frequently than the former, so whenever somebody checks in a library change also checks in the archive (*.a). People working on the game fetch the library and rebuild just the game code, which, from what I can understand, results in the profile metadata going out of sync.
Does this make sense?
P.S. I think this is not a great software engineering practice, but it's a possible scenario (also because I can fix this very case but I can't control all the builds :).
I think it will be helpful to understand how you ended up with two summaries in an LTO build. My main concern in using freshness as the criteria is that you may end up with a profile as a result of which the performance you get up ends up worse than non-PGO case. It is definitely possible that TotalCount is lesser in a fresh profile.
Sorry for the delay in responding. I don't know how real your example is, but in this case the summary in adler.cc has no information and if it happens to be the one with the later timestamp then you'll end up with a bad summary and affect the inliner. One possibility is to drop the profile whose TotalCount is smaller.
I can see your point. As this is an heuristic, it doesn't need to be terribly precise, but I'd of course like to limit the number of false positives.
Is it possible that TotalCount
is lower in in a profile that's not stale? From what I see this is the count value that the instruction executes which doesn't seem necessarily related to the freshness of the profile.
Sorry for the delay in responding. I don't know how real your example is, but in this case the summary in adler.cc has no information and if it happens to be the one with the later timestamp then you'll end up with a bad summary and affect the inliner. One possibility is to drop the profile whose TotalCount is smaller.
I also discussed this with Rafael offline and my proposed solution would be the following:
I originally thought we didn't need this information to be carried at LTO time as lower it in the per-TU pipeline (PGOInstrUse), but Teresa told me that we need to carry this around for LTO as the inliner uses it.
After a while, I'm finally back to this one. David/Easwaran/Rong, which solution would you prefer?
This fails with: /home/davide/work/llvm/build-clang/bin/ld.lld: error: linking module flags 'ProfileSummary': IDs have conflicting values
As discussed previously with David/Easwaran there are two options here:
1) Drop the stale summary. I'm not sure how easy would be to do that with the current profile information, as there's no notion of timestamp. I think we may consider expanding the profile summary to add a timestamp, but of course ideas welcome.
2) Suggested by Easwaran: merge the two histogram profiles.
Before getting into the implementation, I'd like to get ideas on what you think will be the best solution.
Extended Description
Reduced from a case I hit in the wild.
$ cat adler.ll target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-scei-ps4"
define i64 @tinkywinky() { ret i64 0 }
!llvm.module.flags = !{#0}
!0 = !{i32 1, !"ProfileSummary", !1} !1 = !{#2, !3, !4, !5, !6, !7, !8, !9} !2 = !{!"ProfileFormat", !"InstrProf"} !3 = !{!"TotalCount", i64 0} !4 = !{!"MaxCount", i64 0} !5 = !{!"MaxInternalCount", i64 0} !6 = !{!"MaxFunctionCount", i64 0} !7 = !{!"NumCounts", i64 0} !8 = !{!"NumFunctions", i64 0} !9 = !{!"DetailedSummary", !10} !10 = !{#11, !12, !13, !14, !15, !16, !16, !17, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26} !11 = !{i32 10000, i64 0, i32 0} !12 = !{i32 100000, i64 0, i32 0} !13 = !{i32 200000, i64 0, i32 0} !14 = !{i32 300000, i64 0, i32 0} !15 = !{i32 400000, i64 0, i32 0} !16 = !{i32 500000, i64 0, i32 0} !17 = !{i32 600000, i64 0, i32 0} !18 = !{i32 700000, i64 0, i32 0} !19 = !{i32 800000, i64 0, i32 0} !20 = !{i32 900000, i64 0, i32 0} !21 = !{i32 950000, i64 0, i32 0} !22 = !{i32 990000, i64 0, i32 0} !23 = !{i32 999000, i64 0, i32 0} !24 = !{i32 999900, i64 0, i32 0} !25 = !{i32 999990, i64 0, i32 0} !26 = !{i32 999999, i64 0, i32 0}
]$ cat main.ll target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-scei-ps4"
define i32 @patatino() { ret i32 0 }
!llvm.module.flags = !{#0}
!0 = !{i32 1, !"ProfileSummary", !1} !1 = !{#2, !3, !4, !5, !6, !7, !8, !9} !2 = !{!"ProfileFormat", !"InstrProf"} !3 = !{!"TotalCount", i64 105148752377} !4 = !{!"MaxCount", i64 7107491985} !5 = !{!"MaxInternalCount", i64 6186825744} !6 = !{!"MaxFunctionCount", i64 7107491985} !7 = !{!"NumCounts", i64 128499} !8 = !{!"NumFunctions", i64 33234} !9 = !{!"DetailedSummary", !10} !10 = !{#11, !12, !13, !14, !15, !16, !16, !17, !17, !18, !19, !20, !21, !22, !23, !24, !25, !26} !11 = !{i32 10000, i64 7107491985, i32 1} !12 = !{i32 100000, i64 6186825744, i32 2} !13 = !{i32 200000, i64 1175154662, i32 7} !14 = !{i32 300000, i64 635580706, i32 20} !15 = !{i32 400000, i64 395565517, i32 42} !16 = !{i32 500000, i64 250826197, i32 75} !17 = !{i32 600000, i64 96522752, i32 150} !18 = !{i32 700000, i64 49363736, i32 311} !19 = !{i32 800000, i64 25985024, i32 599} !20 = !{i32 900000, i64 9803979, i32 1273} !21 = !{i32 950000, i64 4164740, i32 2053} !22 = !{i32 990000, i64 658981, i32 4382} !23 = !{i32 999000, i64 44010, i32 8841} !24 = !{i32 999900, i64 13789, i32 13065} !25 = !{i32 999990, i64 1913, i32 14489} !26 = !{i32 999999, i64 160, i32 15937}
Script to reproduce (adjusting the paths)
$ cat run.sh rm -rf main.o adler.o ~/work/llvm/build-clang/bin/llvm-as /home/davide/smbrokenpgo/reduced/main.ll -o main.o ~/work/llvm/build-clang/bin/llvm-as /home/davide/smbrokenpgo/reduced/adler.ll -o adler.o ~/work/llvm/build-clang/bin/ld.lld main.o adler.o