Closed Michael137 closed 1 month ago
First stab at this. The plan is to collect metrics that relate to type completion. So we get insight into the impact of changes around the area. E.g.,:
These don't all exist yet, but the idea is to add them to the statistics dump
command.
Currently I'm also timing the test scenarios. That metric is presumably much less stable and I wouldn't be opposed removing that in the first iteration of this bot.
Any thoughts/concerns/wishlist items?
I think this could be interesting. I don't have much to add in the way of specifics, just a couple of questions/observations:
- (where) will we be able to see the results of these benchmarks?
Currently I just dump them to the console. The idea in the near future is to publish the data to something like LNT (though that currently seems to be down), and plot some sort of time series out of it.
- When benchmarking a debugger, there are many moving parts: a) the debugger itself; b) the code being debugged (inferior); c) the compiler compiling the inferior (; and possibly d) compiler compiling the debugger). Moving all four makes it hard to interpret the results. Based on the mentions of "historic compilers" in the patch, I'm deducing that you're trying to fix some of these, but I wasn't able to figure out which ones. Can you tell me which of these are fixed?
Good point, I'll try to clarify this in the pipeline definition.
(a) The "host" compiler/LLDB is taken from whatever the LLDB incremental built/used (I haven't checked that those artifacts are available, but would be nice if we could re-use that). The metrics we collect are from that "host" LLDB that we fetched.
(b) The debugger/compiler that we're debugging (in the HISTORIC_BUILD_DIR
) is pinned to the llvm-19.x release (which seemed like a good starting point for something stable)
(c) We use the compiler from (a) to build the "historic" Clang/LLDB.
(d) The compiler compiling the debugger in (a) is the clang produced by the clang-stage2 buildbot
We could alternatively choose not to re-use the artifacts from other buildbots and instead build a brand new LLDB/Clang from top-of-tree using a pinned version of Clang. In that case (b) and (d) would be stable, while (a) and (c) followed top-of-tree. That does seem like a more maintainable situation (at the cost of rebuilding Clang/LLDB more often)
- (where) will we be able to see the results of these benchmarks?
Currently I just dump them to the console. The idea in the near future is to publish the data to something like LNT (though that currently seems to be down), and plot some sort of time series out of it.
Got it. Thanks.
- When benchmarking a debugger, there are many moving parts: a) the debugger itself; b) the code being debugged (inferior); c) the compiler compiling the inferior (; and possibly d) compiler compiling the debugger). Moving all four makes it hard to interpret the results. Based on the mentions of "historic compilers" in the patch, I'm deducing that you're trying to fix some of these, but I wasn't able to figure out which ones. Can you tell me which of these are fixed?
Good point, I'll try to clarify this in the pipeline definition.
(a) The "host" compiler/LLDB is taken from whatever the LLDB incremental built/used (I haven't checked that those artifacts are available, but would be nice if we could re-use that). The metrics we collect are from that "host" LLDB that we fetched.
One argument for not fetching those is that you might want to use different build options for each. E.g. the incremental build bot might want to enable assertions or stuff, whereas the benchmarking bot might not.
(b) The debugger/compiler that we're debugging (in the
HISTORIC_BUILD_DIR
) is pinned to the llvm-19.x release (which seemed like a good starting point for something stable)(c) We use the compiler from (a) to build the "historic" Clang/LLDB.
(d) The compiler compiling the debugger in (a) is the clang produced by the clang-stage2 buildbot
:+1:
We could alternatively choose not to re-use the artifacts from other buildbots and instead build a brand new LLDB/Clang from top-of-tree using a pinned version of Clang. In that case (b) and (d) would be stable, while (a) and (c) followed top-of-tree. That does seem like a more maintainable situation (at the cost of rebuilding Clang/LLDB more often)
I think both of these are reasonable choices, and its up to you to choose which one makes most sense for your use case. I'm interesting in the details just so that I know how to interpret the results.
This patch adds a new job to collect LLDB metrics.
This is heavily based on the
debuginfo-statistics
job (but currently doesn't publish data to LNT).Currently this job would do the following:
lldb-cmake-intel
job (not actually sure if that job publishes the right artifacts at the moment) to run therun_lldb_metrics.sh
script.statistics dump
command tostdout
(note we don't do any kind of averaging of these over multiple runs, since the metrics we care about should stable across runs). We also currently run these test-scenarios throughhyperfine
and dump the timing data. But maybe for a first attempt this isn't necessary.