Store a marker for benchmark versions

alistaire47 commented 2 years ago

Closes #101. Adds semantic versioning to benchmarks that can be incremented when benchmarks are altered. Alters JSON schema for results from

{
  "name": "array_altrep_materialization",
  ...
}

to

{
  "benchmark": {
    "name": "array_altrep_materialization",
    "version": "1.0.0"
  },
  ...
}

as name and version together identify a benchmark. All versions are set to 1.0.0 for now.

These changes have not been implemented in conbench or ursacomputing/benchmarks yet, but since {benchmarks} is ignoring everything in the JSON except blob.result.real, this shouldn't break anything. However, {benchmarks} and {conbench} should be brought in line with this behavior; if we make this change, https://github.com/conbench/conbench/issues/314 will require simultaneous PRs to both {conbench} and {benchmarks} so as not to break anything.

alistaire47 commented 2 years ago

One more general comment: any reason for using semantic versions? I'm generally pro semantic versions, but for these I'm not sure we will ever need to distinguish between X Y or Z in X.Y.Z to treat those differently.

I was thinking about this and initially was leaning towards just an integer, but really we can/do make smaller non-breaking changes to benchmarks. Maybe those aren't worth versioning if it's just to cause a history discontinuity? If so, we need to document thoroughly when versions should be incremented and what the implications are.

jonkeane commented 2 years ago

I was thinking about this and initially was leaning towards just an integer, but really we can/do make smaller non-breaking changes to benchmarks. Maybe those aren't worth versioning if it's just to cause a history discontinuity? If so, we need to document thoroughly when versions should be incremented and what the implications are.

Yeah, it's possible that there are other reasons we might want to version that won't reset history, but I honestly can't think of any off the top of my head. Agreed that either way we should document that version bumps (or which version number bumps) cause history to be reset.

alistaire47 commented 2 years ago

Yeah, it's possible that there are other reasons we might want to version that won't reset history, but I honestly can't think of any off the top of my head.

I'm thinking about how the altrep materialization one has a query param where most queries (aside from everything) are specific to the dataset, and not all datasets have queries implemented. New queries would create new cases; I'm not sure if that should break history? I'd think of it as an additive minor version bump. Similar for when Sam added compression to CSV writing.

For patch versions I think I've done some minor refactoring which might count?

In all cases I don't know if we care—do these changes merit versioning? Or do we only care about changes that will impact times for a particular case? If the latter that's fine, we just need to make sure we don't increment when we shouldn't.

jonkeane commented 2 years ago

In all cases I don't know if we care—do these changes merit versioning? Or do we only care about changes that will impact times for a particular case? If the latter that's fine, we just need to make sure we don't increment when we shouldn't.

Yeah, I don't think we care for those. I mean we have a version of sorts in the git history — though that's not connected or recorded in conbench, I think that's not really important. I could totally be convinced we do want to save this information, but so far I haven't really seen a reason.

I'm thinking about how the altrep materialization one has a query param where most queries (aside from everything) are specific to the dataset, and not all datasets have queries implemented.

This is a bit tangential to a slightly different problem: (and I swear I had an issue for this somewhere, but can't find it now!) for some benchmarks we use subsets of sources, but we don't really record that as part of the source (it's frequently an independent option in the cases). But I bet we could do something more reliable + faithful to what's going on that isn't custom to each benchmark

alistaire47 commented 2 years ago

This is a bit tangential to a slightly different problem: (and I swear I had an issue for this somewhere, but can't find it now!) for some benchmarks we use subsets of sources, but we don't really record that as part of the source (it's frequently an independent option in the cases). But I bet we could do something more reliable + faithful to what's going on that isn't custom to each benchmark

This sounds potentially complicated; the reasons I've seen for subsetting get complicated fast, e.g. the altrep one subsets to types that can be represented by altrep. The other messy one I've seen is all the mess with the Fannie Mae null columns, but we should just fix those (for which we do have a story)

jonkeane commented 2 years ago

This sounds potentially complicated; the reasons I've seen for subsetting get complicated fast, e.g. the altrep one subsets to types that can be represented by altrep. The other messy one I've seen is all the mess with the Fannie Mae null columns, but we should just fix those (for which we do have a story)

Totally, and maybe ultimately any generic subsetting we do should be built into the data sources tool anyway and that is something that will handle the naming of those.

voltrondata-labs / arrowbench

Store a marker for benchmark versions #104