h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

DataFrames.jl 1.0.1 is out, benchmarks are outdated (regarding Julia) #195

Closed PallHaraldsson closed 3 years ago

PallHaraldsson commented 3 years ago

Hi,

Since "innerjoin, leftjoin, rightjoin, outerjoin, semijoin, and antijoin are now much faster" in 1.0 of DataFrames.jl, and you benchmarked older version it would be nice if you can rerun benchmarks. Also Julia 1.6.1 is out, while I'm not sure it should be faster for this, it's best to use it so people are not in doubt.

I'm also curious if out-of-core processing just works, I understand it's there in the package (maybe only for Arrow files?).

jangorecki commented 3 years ago

Hi Pali, When you asked the question benchmark for julia 1.0.1 was already running. It finished later on and results that you asked for are already published on the report. Below you can find comparison of previously tested version vs 1.0.1. As we can see there is a big speed up. If you asked for out-of-core processing for julia then I am not aware of it, if it just works, then we should see "join" task on 1e9 to giving some timings. It doesn't, so I assume it is not yet supported in julia.

groupby

|in_rows |knasorted                                       |question_group |question                    | 20210408_0.22.7| 20210426_1.0.1| new2old|
|:-------|:-----------------------------------------------|:--------------|:---------------------------|---------------:|--------------:|-------:|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.299|          0.304|    0.23|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.268|          0.084|    0.31|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.684|          0.456|    0.67|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.462|          0.185|    0.40|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           0.888|          0.346|    0.39|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           2.264|          1.861|    0.82|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           1.231|          1.400|    1.14|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |           2.345|          2.096|    0.89|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           2.012|          1.266|    0.63|
|1e7     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           1.921|          2.484|    1.29|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.407|          0.322|    0.23|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.268|          0.073|    0.27|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.958|          0.610|    0.64|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.484|          0.206|    0.43|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           1.298|          0.627|    0.48|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           1.767|          1.390|    0.79|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           2.352|          2.263|    0.96|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |           5.184|          4.423|    0.85|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           1.534|          0.930|    0.61|
|1e7     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.517|          2.485|    0.99|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           1.297|          0.305|    0.24|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.256|          0.072|    0.28|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           1.670|          0.755|    0.45|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.452|          0.191|    0.42|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           1.473|          1.142|    0.78|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           1.383|          1.126|    0.81|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           4.059|          3.156|    0.78|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          10.977|          8.847|    0.81|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           1.350|          0.809|    0.60|
|1e7     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.670|          2.479|    0.93|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |           1.314|          0.328|    0.25|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |           0.273|          0.090|    0.33|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |           0.680|          0.457|    0.67|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |           0.454|          0.182|    0.40|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |           0.888|          0.383|    0.43|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |           2.310|          1.898|    0.82|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |           1.230|          1.334|    1.08|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |           2.357|          1.997|    0.85|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |           1.431|          0.857|    0.60|
|1e7     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |           1.905|          2.460|    1.29|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |           1.430|          0.343|    0.24|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.285|          0.084|    0.29|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           0.711|          0.486|    0.68|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           0.553|          0.325|    0.59|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           0.985|          0.400|    0.41|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |           2.576|          2.042|    0.79|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |           1.698|          1.562|    0.92|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |           2.661|          2.338|    0.88|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |           2.972|          2.575|    0.87|
|1e7     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |           2.164|          2.776|    1.28|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           2.213|          0.942|    0.43|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           1.026|          0.771|    0.75|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           5.225|          3.755|    0.72|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.769|          1.301|    0.47|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |           9.921|          3.928|    0.40|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          17.910|         13.667|    0.76|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          16.864|         16.306|    0.97|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          32.412|         25.756|    0.79|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          15.148|         10.469|    0.69|
|1e8     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          27.944|         18.766|    0.67|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           2.563|          1.000|    0.39|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.921|          0.710|    0.77|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |          10.768|          9.344|    0.87|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.706|          1.319|    0.49|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          15.515|         12.834|    0.83|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          12.004|          9.662|    0.80|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          28.399|         25.419|    0.90|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |          64.707|         56.707|    0.88|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          10.848|          8.683|    0.80|
|1e8     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          36.952|         21.002|    0.57|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |           3.981|          1.156|    0.29|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           0.936|          0.698|    0.75|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |          25.464|         12.803|    0.50|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           2.951|          1.678|    0.57|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          19.090|         25.847|    1.35|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          12.868|          9.075|    0.71|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          53.671|         40.396|    0.75|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |         124.747|        126.466|    1.01|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          12.218|          7.443|    0.61|
|1e8     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          38.693|         26.716|    0.69|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |           2.399|          1.157|    0.48|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |           1.155|          0.886|    0.77|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |           4.580|          3.597|    0.79|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |           2.723|          1.198|    0.44|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |           9.897|          3.997|    0.40|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |          17.775|         14.362|    0.81|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |          16.593|         16.873|    1.02|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |          32.623|         26.762|    0.82|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |           8.233|          5.496|    0.67|
|1e8     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |          26.806|         18.907|    0.71|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |           2.463|          1.000|    0.41|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           1.040|          0.809|    0.78|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |           4.551|          3.727|    0.82|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |           3.313|          1.733|    0.52|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |          11.843|          4.255|    0.36|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |          20.167|         14.674|    0.73|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |          19.426|         17.914|    0.92|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |          34.922|         26.448|    0.76|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |          20.954|         15.541|    0.74|
|1e8     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |          27.872|         19.549|    0.70|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |          12.678|         15.705|    1.24|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |           9.801|          9.075|    0.93|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |         123.048|         89.388|    0.73|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |          32.543|         23.274|    0.72|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |         224.940|        120.389|    0.54|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |         224.752|        195.448|    0.87|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |         334.029|        357.317|    1.07|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e1 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |2e0 cardinality factor, 0% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1               |          21.852|          9.445|    0.43|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 by id1:id2           |          12.664|          8.576|    0.68|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1 mean v3 by id3       |         107.168|         54.936|    0.51|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |mean v1:v3 by id4           |          39.450|         11.435|    0.29|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |basic          |sum v1:v3 by id6            |         200.221|         79.478|    0.40|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |median v3 sd v3 by id4 id5  |         241.610|        193.551|    0.80|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |max v1 - min v2 by id3      |         301.821|        244.512|    0.81|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 0% NAs, pre-sorted data |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1               |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 by id1:id2           |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1 mean v3 by id3       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |mean v1:v3 by id4           |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |basic          |sum v1:v3 by id6            |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |median v3 sd v3 by id4 id5  |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |max v1 - min v2 by id3      |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |largest two v3 by id6       |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |regression v1 v2 by id2 id4 |              NA|             NA|      NA|
|1e9     |1e2 cardinality factor, 5% NAs, unsorted data   |advanced       |sum v3 count by id1:id6     |              NA|             NA|      NA|

join

|in_rows |knasorted               |question               | 20210408_0.22.7| 20210426_1.0.1| new2old|
|:-------|:-----------------------|:----------------------|---------------:|--------------:|-------:|
|1e7     |0% NAs, unsorted data   |small inner on int     |           2.635|          0.890|    0.34|
|1e7     |0% NAs, unsorted data   |medium inner on int    |           2.470|          0.808|    0.33|
|1e7     |0% NAs, unsorted data   |medium outer on int    |           7.924|          3.030|    0.38|
|1e7     |0% NAs, unsorted data   |medium inner on factor |           3.427|          0.959|    0.28|
|1e7     |0% NAs, unsorted data   |big inner on int       |           7.428|          2.387|    0.32|
|1e7     |5% NAs, unsorted data   |small inner on int     |           3.006|          0.999|    0.33|
|1e7     |5% NAs, unsorted data   |medium inner on int    |           2.415|          0.863|    0.36|
|1e7     |5% NAs, unsorted data   |medium outer on int    |           7.753|          3.154|    0.41|
|1e7     |5% NAs, unsorted data   |medium inner on factor |           3.521|          1.084|    0.31|
|1e7     |5% NAs, unsorted data   |big inner on int       |           7.548|          3.804|    0.50|
|1e7     |0% NAs, pre-sorted data |small inner on int     |           2.463|          0.720|    0.29|
|1e7     |0% NAs, pre-sorted data |medium inner on int    |           1.931|          0.742|    0.38|
|1e7     |0% NAs, pre-sorted data |medium outer on int    |           6.941|          2.449|    0.35|
|1e7     |0% NAs, pre-sorted data |medium inner on factor |           2.455|          0.840|    0.34|
|1e7     |0% NAs, pre-sorted data |big inner on int       |           7.530|          1.452|    0.19|
|1e8     |0% NAs, unsorted data   |small inner on int     |         122.010|         82.456|    0.68|
|1e8     |0% NAs, unsorted data   |medium inner on int    |         135.868|         94.706|    0.70|
|1e8     |0% NAs, unsorted data   |medium outer on int    |         217.156|        112.529|    0.52|
|1e8     |0% NAs, unsorted data   |medium inner on factor |         146.366|         96.223|    0.66|
|1e8     |0% NAs, unsorted data   |big inner on int       |         255.316|         91.470|    0.36|
|1e8     |5% NAs, unsorted data   |small inner on int     |         118.436|         92.100|    0.78|
|1e8     |5% NAs, unsorted data   |medium inner on int    |         131.347|         93.817|    0.71|
|1e8     |5% NAs, unsorted data   |medium outer on int    |         221.642|        110.364|    0.50|
|1e8     |5% NAs, unsorted data   |medium inner on factor |         145.716|         97.051|    0.67|
|1e8     |5% NAs, unsorted data   |big inner on int       |         259.400|        130.767|    0.50|
|1e8     |0% NAs, pre-sorted data |small inner on int     |         119.080|        100.430|    0.84|
|1e8     |0% NAs, pre-sorted data |medium inner on int    |         123.884|         90.533|    0.73|
|1e8     |0% NAs, pre-sorted data |medium outer on int    |         223.578|        104.135|    0.47|
|1e8     |0% NAs, pre-sorted data |medium inner on factor |         127.576|         93.672|    0.73|
|1e8     |0% NAs, pre-sorted data |big inner on int       |         253.908|         83.728|    0.33|
|1e9     |0% NAs, unsorted data   |small inner on int     |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium inner on int    |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium outer on int    |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |medium inner on factor |              NA|             NA|      NA|
|1e9     |0% NAs, unsorted data   |big inner on int       |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |small inner on int     |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium inner on int    |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium outer on int    |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |medium inner on factor |              NA|             NA|      NA|
|1e9     |5% NAs, unsorted data   |big inner on int       |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |small inner on int     |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium inner on int    |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium outer on int    |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |medium inner on factor |              NA|             NA|      NA|
|1e9     |0% NAs, pre-sorted data |big inner on int       |              NA|             NA|      NA|
bkamins commented 3 years ago

@PallHaraldsson - also the benchmark was run on old setup of DataFrames.jl. Now the PR fixing this has been merged (and I hope that the next run will show improvements - especially in join operations).

@jangorecki - your work is great. It allows us to pinpoint the performance choke points we have in DataFrames.jl!

PallHaraldsson commented 3 years ago

Thanks, I'm pretty sure I was looking at numbers after update to 1.0.1 earlier today (with good speedup, I guess from table above), but since then I see updated again and DF.jl got a lot slower, e.g. 2s vs 7s. I do not remember other numbers, and it's not simply I misremeber as I had calculated:

julia> 0.3+0.08+0.46+0.18+0.35 # First run
1.37

julia> 0.06+0.08+0.24+0.10+0.27 # Second run
0.75

julia> 1.37+0.75 # What you reported, then, but rounded to 2s
2.12

Now all queries much slower, except Query 4 and Query 5, that are the exact same (as above) or latter slightly faster for second run.

jangorecki commented 3 years ago

@PallHaraldsson there was another run using different julia config, see https://github.com/h2oai/db-benchmark/pull/194#issuecomment-827948320 for details

bkamins commented 3 years ago

Yes - we are post 1.0 release and it introduced significant changes. Therefore we yet need to learn how to properly tune the whole ecosystem and H2O benchmarks are great to learn where we have problems (in short: the only change we did was a different setting of CSV reader - @quinnj is aware of this issue and I know he is working on improving things here).