Add and use BenchmarkResults class

alistaire47 commented 2 years ago

Closes #112 and #113 by adding a new BenchmarkResults dataclass.

Motivation:

As we modularize the conbench ecosystem, {benchmarks} will need to be able to output consistent json that can be picked up by another tool which will send it all to conbench. That json can be generated in a consistent fashion by calling dataclasses.asdict() on an instance of BenchmarkResults (or our own wrapper which adjusts structure, names, etc. a bit if we like).

Design choices:

For now, uses the builtin dataclasses.dataclass instead of pydantic in order to start minimal. We can escalate to pydantic later if we like, especially as we start to harden this schema.

Also I chose to put this code here for now instead of directly in {conbench}. We may want to move some of this to the .benchmark() and .record() methods there if we plan on keeping them there.

Notable changes:

Adds simple new BenchmarkResults dataclass and adjusts the runners to return an instance of it instead of result, output, where result was a json blob and output was a string of stdout.
Changes run methods to return new class
Adjusts a lot of asserts in tests to handle the new structure
Stops trying to send stuff to conbench (on localhost) when a DRY_RUN env var is set so running the tests is less of a mess
Removes a bunch of copy.deepcopy() calls that weren't doing anything

@ElenaHenderson @austin3dickey Tagging you both because this changes outputs, and I'm not totally sure if these changes will impact anything in the larger ecosystem. I don't think so because most i/o should be through sending stuff to conbench, but please check me on that (and feel free to review the rest in as much depth as you like).

alistaire47 commented 2 years ago

Conbench runner expects benchmark() and record() methods of Benchmark classes to return benchmark dict and output. Returning BenchmarkResult object only will not work.

I recommend testing PR changes by running actual benchmarks in this repo locally using conbench.

So I did, and it...doesn't fail? It doesn't print anything even if I set --show-result and --show_output, but the return code is always 0 and it seems to be taking a reasonable amount of runtime for various runs. I don't have a local conbench server set up yet, so I can't really see what it's doing, though.

More strangely, I get exactly the same behavior on main. Looking at cli.py it looks like this branch should error out and main should work, but that's not what's happening. I need to dig deeper to see what's actually happening.

Taking a step back, does conbench do anything with the returns aside from the CLI [maybe] printing them? Everything sent to the server should still look the same. In terms of next steps, we could

a. move the class usage to only be internal to the overwritten methods so eventually JSON can be generated as a side-effect b. make light wrappers for .benchmark() and .record() that return a BenchmarkResult for non-conbench use c. make this PR to conbench instead and make accompanying PRs everywhere else that impacts d. wait for the benchmark orchestration piece to get moved out of conbench and adjust it all then

I'm leaning towards (c) since we can likely just move the code over whenever we get to (d), but I'll have to go chase down every other repo it affects. Maybe there's a simpler option here?

ElenaHenderson commented 2 years ago

@alistaire47 How do you run benchmarks?

I tried this now for this PR's branch and benchmarks defined in this repo fail:

cd ~/benchmarks

conda create -y -n ursa-benchmarks -c conda-forge --file requirements-build.txt python=3.8

conda activate ursa-benchmarks

pip install conbench

python setup.py develop

cd benchmarks

conbench list

conbench csv-read ALL --iterations=1 --all=true

Traceback (most recent call last):
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/bin/conbench", line 8, in <module>
    sys.exit(conbench())
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/elena/miniconda3/envs/ursa-benchmarks/lib/python3.8/site-packages/conbench/cli.py", line 149, in _benchmark
    for result, output in benchmark().run(**kwargs):
TypeError: cannot unpack non-iterable BenchmarkResult object

alistaire47 commented 2 years ago

Yep, sorry, got it failing properly now; I think the combinations of params I was trying were getting ignored

ElenaHenderson commented 2 years ago

@alistaire47 @jonkeane Note that I don't have a strong opinion on using BenchmarkResult class in conbench or benchmarks repo. I am kind of just watching you to see where it will go.

alistaire47 commented 2 years ago

The key reason we need something is that ultimately this repo needs to output files that can get picked up by another module to send to conbench, and it'd be much easier if the output weren't separated and safer if the schema were validated a little. But structurally it looks like we should stick that in conbench or wherever we split off the Benchmark class into. In that case we'll still end up needing a lot of the code in this PR, but the class itself can come from conbench. But for now this PR should be tabled; I can close it for now if we like.

voltrondata-labs / benchmarks