h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

Precompile methods in DataFrames.jl #166

Closed nalimilan closed 3 years ago

nalimilan commented 3 years ago

This ensures we don't include the time needed to compile common methods in the first call of benchmarks. Requires DataFrames 0.22.

Cc: @bkamins

jangorecki commented 3 years ago

I reworked recently internal report presenting historical timings. I think you may be interested to see julia progress over time https://h2oai.github.io/db-benchmark/history.html Note that all presented numbers are available by replacing name of html file in above url to time.csv

bkamins commented 3 years ago

This is VERY interesting. Thank you!

Do you plan to add a link to https://h2oai.github.io/db-benchmark/history.html in the benchmarks page somewhere so that it is easily discoverable?

PS. In the next release we will finally add multi-threading support to DataFrames.jl thanks to @nalimilan .

jangorecki commented 3 years ago

No plan to link it from report. It is meant to be used internally to spot regression more easily. Using time.csv it is easy to produce these kind of report. Current history report is pretty simple, lattice package takes care of all plots. https://github.com/h2oai/db-benchmark/blob/master/_report/history.Rmd As for multithreading, here is example of what I mentioned before, 50% vs 100% cores usage: https://github.com/Rdatatable/data.table/issues/4818#issuecomment-742627032