smarr / ReBenchDB

ReBenchDB records benchmark results and provides customizable reporting to track and analyze run-time performance of software programs.
MIT License
12 stars 6 forks source link

What are the start-time and end-time used for #176

Closed irevoire closed 7 months ago

irevoire commented 7 months ago

Hey,

I'm trying to understand every field of the benchmark data (and write documentation for it eventually) and I was wondering what the start_time and end_time were used for?

export interface BenchmarkData {
  data: Run[];
  criteria?: Criterion[];
  env: Environment;
  source: Source;

  experimentName: string;
  experimentDesc?: string;

  startTime: string;
  endTime?: string | null;
  projectName: string;
}

Since the startTime is mandatory I guess it's probably used as the timestamp in the timeseries? And if that's the case, then I don't see the point of specifying the endTime? Also, for usage over time, shouldn't we insert the time when the commit was created instead of the time when the benchmark was run in the time-series, that seems easier to navigate over time, but I may be wrong?

smarr commented 7 months ago

Hm, in absence of other documentation, there are some comments in the database definition here: https://github.com/smarr/ReBenchDB/blob/1245e6bdd20bc0fe9b630fafe5e4ccd27512a8dd/src/backend/db/db.sql

For wider context, Run, Trial, and Experiment are the most important ones.

The short version:

Experiment: An experiment can be composed of multiple Trials. To identify experiments, we use a name. Optionally, a more elaborated description can be provided for documentation.

Trial: Is part of an experiment, and consists of measurements. Multiple trials can belong to a single experiment. Trials are something like CI jobs or manual executions to collect all the data for a specific experiment.

Run: A concrete execution of a benchmark by a specific executor.

Now to your question: a trial has startTime, mostly to have a bit of an extra hint of when the data was collected. The end time is really also just extra information. The start time is used in some places to identify the latest benchmark data.

And if that's the case, then I don't see the point of specifying the endTime? Yeah, it's not really used at the moment.

I do run benchmarks on multiple machines. So, when all jobs are done, I "report completion", at which point the endTime is recorded. But that's mostly just for record keeping.

Also, for usage over time, shouldn't we insert the time when the commit was created instead of the time when the benchmark was run in the time-series, that seems easier to navigate over time, but I may be wrong?

In my use case, the benchmark running can be completely decoupled from the commit. So, I want to keep things separate. I can also run benchmarks multiple times for the same commit, so, multiple "experiments". So, for me it's useful to have the time separated.

irevoire commented 7 months ago

I can also run benchmarks multiple times for the same commit, so, multiple "experiments".

Never thought about that, it makes sense! :brain: Thanks a lot for the explanation and resources