Closed alistaire47 closed 2 years ago
How hard would it be to add a test that covers this circumstance + that it's going what we expect?
A little hard, because arrowbench doesn't have an exported benchmark with case versioning (they're constructed directly during testing). _example_benchmarks.py
(used for testing) currently references placebo
, so we could add a versioned copy of that, but I don't see a way to do this neatly without a. a PR to arrowbench, b. including some R code in {benchmarks}, or c. some serious mocking.
If we want to go with (b) or (c) that's fine, but we should probably do it as part of a larger revamp in our unit testing where we test the core structures of {benchmarks} more directly than we do currently.
nods ok, would you mind making a TODO with some of those options for testing so we remember to come back to that when we do that larger revamp? Then I'm fine to merge this as is
Closes #107. When it exists, pulls the value for
case_version
out of the tags of the JSON generated by {arrowbench} when running R benchmarks, and appends it to the tags sent on to conbench.Notably this PR does not replace all the Python-generated tags with the R-generated ones, because they do not correspond exactly, so doing so would break histories. There are reasons they differ, e.g. {arrowbench} does not also append
name
as atag
(maybe it should for history's sake? but it's duplicative); params vary in cases likecsv-read
where Python and R do not have the same params and functionality. Long-term, we'll need to fix the situation, but efforts now to align {benchmarks} and {arrowbench} will become historical artifacts as orchestration gets moved out of {benchmarks}, so I think it's more worthwhile now to figure out what we want our data to look like, and then break all our histories at that point. Open to other ideas!