h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
321 stars 85 forks source link

Fix Julia setup #199

Closed bkamins closed 3 years ago

bkamins commented 3 years ago

What I change here:

@jangorecki The only thing I was not sure was how you wanted to handle the Threads.nthreads() reporting form the groupby-juliadf.jl and join-juliadf.jl, so I have not added it.

bkamins commented 3 years ago

also there was a typo in CSV reader kwarg name which I fixed.

I have also tested locally that calling GC twice is not needed, so I remove it - and leave only one call.

bkamins commented 3 years ago

I had also to fix the big name, as it clashes with the standard function in Julia:

  big(x)

  Convert a number to a maximum precision representation (typically BigInt or BigFloat). See BigFloat for information about some pitfalls with
  floating-point numbers.

(the issue is exposed when enabling multi-threading)

For consistency I have added _df suffix to all DataFrame names in the join benchmarks.

bkamins commented 3 years ago

@jangorecki - is all I propose clear and acceptable? Thank you!

bkamins commented 3 years ago

Is there anything similar in julia?

Normally you pass -t 20 argument, but in your OS configuration it does not work unfortunately because the -S is not supported.

Alternatively, as discussed earlier, you can create intermediate .sh files containing respectively:

Would this approach work for you?

jangorecki commented 3 years ago

Env var looks to be more simple.

bkamins commented 3 years ago

OK - so I understand it can be left as is now? (ah - or you move it so some other .sh file - right?) Thank you!

jangorecki commented 3 years ago

as is now is good

jangorecki commented 3 years ago

It is a pity that it is not possible to change number of threads after julia is already started. I will have to use extra shell script as you suggested. Setting env var is more neat but will not work when running single script with _launcher/solution.R script.

bkamins commented 3 years ago

Indeed it is a pity. I really wish it was possible to change number of threads that Julia process uses (and AFAICT it might be possible in the future, but not currently). Thank you for working on it.

bkamins commented 3 years ago

@jangorecki - I understand that the current timings shown on the page (that are dated for May 7, 2021 are still old for DataFrames.jl - right? Does the date next to the package version show when the test was run?)

As a reference: the current release of DataFrames.jl is 1.1.1, so I assume that figures are for the old run.

jangorecki commented 3 years ago

On the top of benchplot there are versions and dates. Julia is already running now so till tomorrow should be on the report.