voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

Add capability to version cases #105

Closed alistaire47 closed 2 years ago

alistaire47 commented 2 years ago

Successor to #104; closes #101. This PR inserts versioning for cases into params via a benchmark-configurable function that takes a case (a named list of params) and returns an integer version.

There are a couple important questions here:

  1. Should version be inserted as a param in the JSON (and therefore in conbench)? We're already inserting a lot of stuff into params that's arguably metadata (cpu_count, lib_path, a dataframe of packages); at some point we should move this stuff to a metadata section.
  2. What should the default behavior be? This sets everything to version 1L, but I haven't quite figured out yet whether this will break all the histories in conbench. If so, it's probably better to change it to default to not setting it.
alistaire47 commented 2 years ago

Regarding 2, what exactly does create a history discontinuity in conbench? @austin3dickey have you happened to learn this yet?

jonkeane commented 2 years ago

Regarding 2, what exactly does create a history discontinuity in conbench? @austin3dickey have you happened to learn this yet?

We should document this since it comes up frequently, but:

Differences in:

https://github.com/conbench/conbench/blob/a3073041f702a4da8ad708afa776be6d7ab0ca72/conbench/entities/history.py#L35-L68

jonkeane commented 2 years ago
  1. What should the default behavior be? This sets everything to version 1L, but I haven't quite figured out yet whether this will break all the histories in conbench. If so, it's probably better to change it to default to not setting it.

We definitely shouldn't break history — we could either migrate and add 1s in everywhere or we could have a special case in conbench history where for version NULL == 1

jonkeane commented 2 years ago
  1. Should version be inserted as a param in the JSON (and therefore in conbench)? We're already inserting a lot of stuff into params that's arguably metadata (cpu_count, lib_path, a dataframe of packages); at some point we should move this stuff to a metadata section.

Hmm, yeah I think this is ok. It is like many of the other variables that defines a case (at least on the conbench side)...

alistaire47 commented 2 years ago

Differences in:

* case values "tags"

* contexts

* hardware (though not _all_ values in a hardware are taken into account, the additional info, for example is allowed to vary)

Hm, so we are writing most params (currently including case_version on this branch) to tags. I think the simplest approach is to default to not writing it then (but do write it when it's explicitly set!); otherwise we'll need special-casing elsewhere

alistaire47 commented 2 years ago

Not a ton to say other than that it feels like until we actually need a version change we should stick with the NULL == 1 situation.

Right now NULL != 1, because when it's NULL it won't add case_version to tags, but when it's 1 it will (and therefore will break history). Dunno if that's good, really, but it's easiest, as if we start defaulting everything to 1 it will break all the histories unless we go adjust everything

alistaire47 commented 2 years ago

ah crap my git branches got conflated! sorry, fixing