Add support for profiling benchmarks and reporting results to ReBenchDB

smarr commented 2 years ago

Looking at changes in benchmark numbers is unfortunately rarely very insightful by itself.

To understand what benchmarks spend their time on, it would be useful to add support for profiling. Once upon a time, we had support for it already (for details: https://github.com/smarr/ReBench/issues/18, https://github.com/smarr/ReBench/pull/9, code removal: https://github.com/smarr/ReBench/commit/6e6e2511baf42e6c8c69a32d65ee5e0545ad0fcc)

At this point in time, I am looking for having support for function profiling of interpreters with perf, Xcode Instruments, or perhaps Java's Flight Recorder.

Most urgent for me is the ability to profile the executors. One may perhaps also want to be able to profile the benchmarks themselves. Here the difference would be at which level profiling is done. So, at the VM or the application level.

Desired Features

make profiling information available where we analyze performance
define profiling commands/parameters for executors and benchmark suites
add a profiling execution mode or experiment setup to use the profiling commands/parameters
collect profiling information, extract the basic data and send to ReBenchDB for storage

ReBenchDB Mockup

An integration in ReBenchDB could include a new unfoldable section, which shows the basic profile. In this case, it's showing the result of:

perf record -g -F 9999 --call-graph lbr ./som-native-interp-ast -cp Smalltalk:Examples/Benchmarks/LanguageFeatures Examples/Benchmarks/BenchmarkHarness.som Dispatch 10 0 20
perf report -g graph --no-children --stdio

Once we have the data, we may also want a feature to compare profiles, similar to performance. Depending on the profiling data collected, which may be relative to the overall run time, it might be necessary to consider the actual run time to judge the differences, for instance to avoid showing increase where the overall time actually decreased but the relative parts increase.

Design Considerations

Integration with Benchmarking

For the seamless integration with benchmarking, we need to be able to match benchmark data with profiling data. This mean, internally, things need to end up having the same RunId. That is, a specific profiling Run needs to be identified by the command line of the original benchmark Run.

Currently, we use those RunIds also to store data, track progress, etc.

It seems like I should probably leave the handling of RunIds alone. And also track completion differently, if at all.

One way of doing it would be to have a different way of executing things. rebench.executor.Executor works together with the RunScheduler to identify the runs to be executed, and composing of the final command line.

When composing the final command line for profiling, we need to consider the details for the profiler. This could perhaps be realized as a gauge_adapter?

Though, do I track completion? One way might be simply in a different data store, where only the details needed for completion on tracked, and possibly profiling results.

Machine Setup, Denoise

For benchmarking, we may want to reduce interference, and possible profiling interrupts as much as possible. For profiling on the other hand, we may want to configure the machine mostly for profiling. I don't know whether these settings make a practical difference for benchmarks, if no profiling is actually done. Though, I guess there might be a difference?

So, in the unlikely event that there is, one may want to run benchmarks and profiling with different machine setups.

At the moment, we run denoise at the start, before running benchmarks, and then disable it afterwards. Thus, we don't do it before every benchmark.

To keep it like this, it means, we need to keep profiling and benchmark separate. But since the benchmarking and profiling configurations likely result in the same experiments, which ReBench currently doesn't handle, this is likely a good idea anyway.

TODO

[x] add a basic implementation supporting perf to ReBench
[x] parse data and send compact representation to ReBenchDB
[ ] instead of a stats summary, show perhaps the first 3-4 lines of the profile in the summary after a ReBench run
[ ] add support for other profilers, perhaps just for running. May need ways to define output files, as well as profiler selection

smarr commented 2 years ago

Notes on invoking profiling with other tools than perf:

Xcode xcrun xctrace record --template 'Time Profiler' --output tr2.trace --launch -- /Users/smarr/Projects/FastStart/truffleruby/mxbuild/truffleruby-native/languages/ruby/bin/ruby --experimental-options --engine.Compilation=false harness.rb MicroDispatchBase 200 40
Java's Flight Recorder needs the following parameters: -XX:+UnlockCommercialFeatures -XX:+FlightRecorder -XX:StartFlightRecording=delay=10s,duration=10d,name=fr-recording2,filename=fr-recording2.jfr,settings=profile

Compilation Changes

GraalVM native image compilation may or may not need the some of following arguments: -H:-DeleteLocalSymbols -g

smarr commented 2 years ago

Some useful links, also to web-based profile inspectors:

One may want to keep the raw data of profiles around for inspection in an IDE context, or local tools. Though, for longer-term archival, we probably need to keep more compact information.

smarr / ReBench