h2oai / db-benchmark

reproducible benchmark of database-like ops
https://h2oai.github.io/db-benchmark
Mozilla Public License 2.0
322 stars 85 forks source link

pandas could use dt fread on csv #135

Closed jangorecki closed 4 years ago

jangorecki commented 4 years ago

no need to fread jay files, fread csv is fast enough, less jay files to keep

jangorecki commented 4 years ago

This changes increased most scripts execution time by 2-7% due to loading data from csv rather than jay binary files, but helped to reduce number of files needed to be stored. As for join data there is 50% speed-up, as it was not using jay before, but just pandas read_csv. Note that those percentages are not data reading time, but the whole script time.