Closed ian-wazowski closed 3 months ago
Tardis.dev provides the file in .csv.gz format. By the way, does Tardis also provide data in parquet format?
Tardis.dev provides the file in .csv.gz format. By the way, does Tardis also provide data in parquet format?
No, I'm working on downloading the tardis dataset and then converting it to parquet(lz4, column-wise encoding).
It's 10x faster to read than csv.gz, and the compression ratio increases by about 10-15%.
The processing time required to convert raw Tardis data into Parquet format needs to be taken into account. In any case, I believe it's more appropriate to provide one as a separate data utility since the data has already been processed, not the raw Tardis data.
Changes
Related
discord chat