smarr / ReBench

Execute and document benchmarks reproducibly.
MIT License
83 stars 24 forks source link

Add support for profiling benchmarks and reporting results to ReBenchDB #166

Open smarr opened 2 years ago

smarr commented 2 years ago

Looking at changes in benchmark numbers is unfortunately rarely very insightful by itself.

To understand what benchmarks spend their time on, it would be useful to add support for profiling. Once upon a time, we had support for it already (for details: https://github.com/smarr/ReBench/issues/18, https://github.com/smarr/ReBench/pull/9, code removal: https://github.com/smarr/ReBench/commit/6e6e2511baf42e6c8c69a32d65ee5e0545ad0fcc)

At this point in time, I am looking for having support for function profiling of interpreters with perf, Xcode Instruments, or perhaps Java's Flight Recorder.

Most urgent for me is the ability to profile the executors. One may perhaps also want to be able to profile the benchmarks themselves. Here the difference would be at which level profiling is done. So, at the VM or the application level.

Desired Features

ReBenchDB Mockup

An integration in ReBenchDB could include a new unfoldable section, which shows the basic profile. In this case, it's showing the result of:

perf record -g -F 9999 --call-graph lbr ./som-native-interp-ast -cp Smalltalk:Examples/Benchmarks/LanguageFeatures Examples/Benchmarks/BenchmarkHarness.som Dispatch 10 0 20
perf report -g graph --no-children --stdio
Screen Shot 2021-11-02 at 19 28 33

Once we have the data, we may also want a feature to compare profiles, similar to performance. Depending on the profiling data collected, which may be relative to the overall run time, it might be necessary to consider the actual run time to judge the differences, for instance to avoid showing increase where the overall time actually decreased but the relative parts increase.

Screen Shot 2021-11-03 at 09 09 22

Design Considerations

Integration with Benchmarking

For the seamless integration with benchmarking, we need to be able to match benchmark data with profiling data. This mean, internally, things need to end up having the same RunId. That is, a specific profiling Run needs to be identified by the command line of the original benchmark Run.

Currently, we use those RunIds also to store data, track progress, etc.

It seems like I should probably leave the handling of RunIds alone. And also track completion differently, if at all.

One way of doing it would be to have a different way of executing things. rebench.executor.Executor works together with the RunScheduler to identify the runs to be executed, and composing of the final command line.

When composing the final command line for profiling, we need to consider the details for the profiler. This could perhaps be realized as a gauge_adapter?

Though, do I track completion? One way might be simply in a different data store, where only the details needed for completion on tracked, and possibly profiling results.

Machine Setup, Denoise

For benchmarking, we may want to reduce interference, and possible profiling interrupts as much as possible. For profiling on the other hand, we may want to configure the machine mostly for profiling. I don't know whether these settings make a practical difference for benchmarks, if no profiling is actually done. Though, I guess there might be a difference?

So, in the unlikely event that there is, one may want to run benchmarks and profiling with different machine setups.

At the moment, we run denoise at the start, before running benchmarks, and then disable it afterwards. Thus, we don't do it before every benchmark.

To keep it like this, it means, we need to keep profiling and benchmark separate. But since the benchmarking and profiling configurations likely result in the same experiments, which ReBench currently doesn't handle, this is likely a good idea anyway.

TODO

smarr commented 2 years ago

Notes on invoking profiling with other tools than perf:

Compilation Changes

smarr commented 2 years ago

Some useful links, also to web-based profile inspectors:

One may want to keep the raw data of profiles around for inspection in an IDE context, or local tools. Though, for longer-term archival, we probably need to keep more compact information.