Right now the CSV reading benchmark is reading a gzip file which is actually something of a worst-case scenario (for hot-in-cache data) since the decompression becomes a bottleneck.
Also, the benchmark only tests the CSV file reader and not the streaming CSV reader which is used by the datasets API.
Right now the CSV reading benchmark is reading a gzip file which is actually something of a worst-case scenario (for hot-in-cache data) since the decompression becomes a bottleneck.
Also, the benchmark only tests the CSV file reader and not the streaming CSV reader which is used by the datasets API.