h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
326 stars 88 forks source link

Do not allow for missing values in Julia #70

Closed nalimilan closed 5 years ago

nalimilan commented 5 years ago

One of the strengths of Julia is that it supports both columns which cannot contain missing values and columns that can, allowing for better performance when there are no missing values.

One could also use allowmissing=:auto to leave CSV.read choose automatically depending on whether missing values have been encountered (this isn't used by default since it can fail if no missing values appear in the subset of rows used for detection).

jangorecki commented 5 years ago

Thanks. I will merge it for now but sooner or later we will add missing values tests https://github.com/h2oai/db-benchmark/issues/40 so it will need to be reverted then.

nalimilan commented 5 years ago

OK. It could make sense to have separate tests with and without missing values. allowmissing=:auto should automatically choose the column type depending on whether it contains missing values or not, so both kinds of columns can be tested easily in the same script.