voltrondata-labs / arrowbench

R package for benchmarking
Other
13 stars 9 forks source link

ENG 3886: Save the script that is executed #91

Closed boshek closed 2 years ago

boshek commented 2 years ago

Will close #77 once merged

This PR implements a output_script argument for run_one which defaults to TRUE and some tests. There is also some light cleanup around tests and documentation. Running this suite of benchmarks:

run_benchmark(write_file, source = "nyctaxi_2010-01")

populates a "bm-scripts" directory with files that named similar to the results files except with an "rscript-" prefix.

bm-scripts
└── write_file
    ├── rscript-lz4-1-feather-arrow_table-nyctaxi_2010-01.json
    ├── rscript-lz4-1-feather-data_frame-nyctaxi_2010-01.json
    ├── rscript-lz4-1-parquet-arrow_table-nyctaxi_2010-01.json
    ├── rscript-lz4-1-parquet-data_frame-nyctaxi_2010-01.json
    ├── rscript-lz4-10-feather-arrow_table-nyctaxi_2010-01.json
    ├── rscript-lz4-10-feather-data_frame-nyctaxi_2010-01.json
    ├── rscript-lz4-10-parquet-arrow_table-nyctaxi_2010-01.json
    ├── rscript-lz4-10-parquet-data_frame-nyctaxi_2010-01.json
    ├── rscript-snappy-1-parquet-arrow_table-nyctaxi_2010-01.json
    ├── rscript-snappy-1-parquet-data_frame-nyctaxi_2010-01.json
    ├── rscript-snappy-10-parquet-arrow_table-nyctaxi_2010-01.json
    ├── rscript-snappy-10-parquet-data_frame-nyctaxi_2010-01.json
    ├── rscript-uncompressed-1-feather-arrow_table-nyctaxi_2010-01.json
    ├── rscript-uncompressed-1-feather-data_frame-nyctaxi_2010-01.json
    ├── rscript-uncompressed-1-parquet-arrow_table-nyctaxi_2010-01.json
    ├── rscript-uncompressed-1-parquet-data_frame-nyctaxi_2010-01.json
    ├── rscript-uncompressed-10-feather-arrow_table-nyctaxi_2010-01.json
    ├── rscript-uncompressed-10-feather-data_frame-nyctaxi_2010-01.json
    ├── rscript-uncompressed-10-parquet-arrow_table-nyctaxi_2010-01.json
    └── rscript-uncompressed-10-parquet-data_frame-nyctaxi_2010-01.json

I am not entirely certain whether "bm-scripts" is best place or if that should be nested within "results" but I will wait for feedback on that.

alistaire47 commented 2 years ago

I'm confused why we're storing scripts in JSON. Do we want them to be machine-readable? Are plain R files less so? I get this avoids needing subdirectories for each benchmark to contain setup/run/teardown/whatever scripts, but that's much easier to navigate as a human, and not that hard to navigate as a computer.

I guess that reaction is because when I see a scripts directory that just contains a lot of JSON I get confused

jonkeane commented 2 years ago

Nice work! I'll have more substantive comments later (and maybe we can pair on this when we're together too...). But this is going in the right direction.

I totally agree that it's weird to have these scripts in json, but we will (eventually) be sending them up to conbench (once conbench can accept those...) so json makes sense. I was originally thinking that we would actually put that json inside of the results json (so that, for example, you could get the lines of the script from the results object) as opposed to a separate json file — do you think that's doable?

boshek commented 2 years ago

do you think that's doable

Very. That's a much cleaner idea. I think it also removes the need to have an argument for whether to include the script or not. Probably just easier to assume it will be there.