h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
322 stars 85 forks source link

new task: read #131

Open jangorecki opened 4 years ago

jangorecki commented 4 years ago

Reading data benchmark is on the roadmap. It should cover:

ideas for testing particular features (maybe advanced questions?)

feedback welcome

jangorecki commented 3 years ago

I collected some feedback about this task from our internal discussion.

Initially I will focus only on reading csv, not a binary formats.

For real world data NYT will be good first case, we should probably find one more popular dataset, to have two real world data.

For simulated data:

MichaelChirico commented 3 years ago

relevant issue https://github.com/Rdatatable/data.table/issues/2634