h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
325 stars 88 forks source link

pydatatable groupby Q top 2 by group orders NAs first #172

Closed jangorecki closed 3 years ago

jangorecki commented 3 years ago

Answer produced by Q8 in groupby task for pydatatable and data case having NAs will be different than answers for other solutions. It is caused by ordering of NAs in pydatatable sort function. Script will now escape this Q for NA data case only, once https://github.com/h2oai/datatable/issues/2806 will be resolved we need to amend script and remove escape.

jangorecki commented 3 years ago

We do want to remove NAs in this question, for details see discussion in https://github.com/h2oai/db-benchmark/commit/5bc71130063cc2ef1ffd6b1601d43f0dca9cb0e8 Adding pre-filtering will allow to easily address this question, yet improvement might be possible once datatable could handle that during sort call.