h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
322 stars 85 forks source link

Report step does not run #150

Closed maartenbreddels closed 4 years ago

maartenbreddels commented 4 years ago

I'm trying to see if I can make vaex run your benchmark suite. As a start, I tried to run just the pandas benchmark, but I have trouble running the report:

$ Rscript -e 'rmarkdown::render("./_report/index.Rmd", output_dir="public")'
  |...................                                                   |  27%
label: init
Quitting from lines 24-51 (index.Rmd) 
Error in sum(int, dbl) : invalid 'type' (list) of argument
Calls: <Anonymous> ... model_time -> nrow -> [ -> [.data.table -> approxUniqueN1

Execution halted

Not knowing much about R, maybe you can help me with this error message.

jangorecki commented 4 years ago

Hello,

Does the pandas benchmark scripts run successfully? Is there time.csv and have expected entries populated by the benchmark script?

The code chunk you pasted is related to producing report page that is on the website. It is not related to benchmark scripts itself, but only to presenting timings produced by the benchmark. I am pretty sure it will run into troubles if time.csv has entries only for a single solution.

The easiest way to look at results of your benchmark will be to look at time.csv. You can find description about some of the fields in _docs/maintenance.md#reading-csv-logs-and-timings document.

If what you need is to render the report, then easiest way to do it should be:

Feel free to reach my out if you need help with any of these.

maartenbreddels commented 4 years ago

Hi Jan,

thanks for your quick reaction. I tried running the script with the downloaded csv, and that works! Thanks. I just want to be sure everything runs fine before I make a PR to this repo. Should I use https://github.com/h2oai/db-benchmark/pull/8 as a template?

cheers,

Maarten

jangorecki commented 4 years ago

Better to copy recent scripts instead. For example we not do not use pydatatable/pydatatable.sh as a launcher anymore. Best to copy recent pandas groupby and join script and work on those.

jangorecki commented 4 years ago

@maartenbreddels I am closing this issue as it seems to be answered.

maartenbreddels commented 4 years ago

Absolutely, thanks!