voltrondata-labs / benchmarks

Language-independent Continuous Benchmarking (CB) for Apache Arrow
MIT License
10 stars 11 forks source link

Parameterize CSV reading benchmark for streaming & compression #60

Open westonpace opened 3 years ago

westonpace commented 3 years ago

Right now the CSV reading benchmark is reading a gzip file which is actually something of a worst-case scenario (for hot-in-cache data) since the decompression becomes a bottleneck.

Also, the benchmark only tests the CSV file reader and not the streaming CSV reader which is used by the datasets API.