Closed alistaire47 closed 1 year ago
Conbench runner expects benchmark() and record() methods of Benchmark classes to return benchmark dict and output. Returning BenchmarkResult object only will not work.
I recommend testing PR changes by running actual benchmarks in this repo locally using conbench.
So I did, and it...doesn't fail? It doesn't print anything even if I set --show-result
and --show_output
, but the return code is always 0 and it seems to be taking a reasonable amount of runtime for various runs. I don't have a local conbench server set up yet, so I can't really see what it's doing, though.
More strangely, I get exactly the same behavior on main
. Looking at cli.py it looks like this branch should error out and main should work, but that's not what's happening. I need to dig deeper to see what's actually happening.
Taking a step back, does conbench do anything with the returns aside from the CLI [maybe] printing them? Everything sent to the server should still look the same. In terms of next steps, we could
a. move the class usage to only be internal to the overwritten methods so eventually JSON can be generated as a side-effect
b. make light wrappers for .benchmark()
and .record()
that return a BenchmarkResult
for non-conbench use
c. make this PR to conbench instead and make accompanying PRs everywhere else that impacts
d. wait for the benchmark orchestration piece to get moved out of conbench and adjust it all then
I'm leaning towards (c) since we can likely just move the code over whenever we get to (d), but I'll have to go chase down every other repo it affects. Maybe there's a simpler option here?
@alistaire47 How do you run benchmarks?
I tried this now for this PR's branch and benchmarks defined in this repo fail:
cd ~/benchmarks
conda create -y -n ursa-benchmarks -c conda-forge --file requirements-build.txt python=3.8
conda activate ursa-benchmarks
pip install conbench
python setup.py develop
cd benchmarks
conbench list
conbench csv-read ALL --iterations=1 --all=true
Traceback (most recent call last):
File "/Users/elena/miniconda3/envs/ursa-benchmarks/bin/conbench", line 8, in <module>
sys.exit(conbench())
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/conbench/cli.py", line 149, in _benchmark
for result, output in benchmark().run(**kwargs):
TypeError: cannot unpack non-iterable BenchmarkResult object
Yep, sorry, got it failing properly now; I think the combinations of params I was trying were getting ignored
@alistaire47 @jonkeane Note that I don't have a strong opinion on using BenchmarkResult class in conbench or benchmarks repo. I am kind of just watching you to see where it will go.
The key reason we need something is that ultimately this repo needs to output files that can get picked up by another module to send to conbench, and it'd be much easier if the output weren't separated and safer if the schema were validated a little. But structurally it looks like we should stick that in conbench or wherever we split off the Benchmark
class into. In that case we'll still end up needing a lot of the code in this PR, but the class itself can come from conbench. But for now this PR should be tabled; I can close it for now if we like.
Closes #112 and #113 by adding a new
BenchmarkResults
dataclass.Motivation:
As we modularize the conbench ecosystem, {benchmarks} will need to be able to output consistent json that can be picked up by another tool which will send it all to conbench. That json can be generated in a consistent fashion by calling
dataclasses.asdict()
on an instance ofBenchmarkResults
(or our own wrapper which adjusts structure, names, etc. a bit if we like).Design choices:
For now, uses the builtin
dataclasses.dataclass
instead of pydantic in order to start minimal. We can escalate to pydantic later if we like, especially as we start to harden this schema.Also I chose to put this code here for now instead of directly in {conbench}. We may want to move some of this to the
.benchmark()
and.record()
methods there if we plan on keeping them there.Notable changes:
BenchmarkResults
dataclass and adjusts the runners to return an instance of it instead ofresult, output
, whereresult
was a json blob andoutput
was a string of stdout.asserts
in tests to handle the new structureDRY_RUN
env var is set so running the tests is less of a messcopy.deepcopy()
calls that weren't doing anything@ElenaHenderson @austin3dickey Tagging you both because this changes outputs, and I'm not totally sure if these changes will impact anything in the larger ecosystem. I don't think so because most i/o should be through sending stuff to conbench, but please check me on that (and feel free to review the rest in as much depth as you like).