wireservice / csvkit

A suite of utilities for converting to and working with CSV, the king of tabular file formats.
https://csvkit.readthedocs.io
MIT License
6.03k stars 603 forks source link

Support zstandard-compressed (.zst) CSV files #1224

Closed gsauthof closed 9 months ago

gsauthof commented 1 year ago

Say you have a directory of gzip or zstandard compressed csv files you want to merge.

For this it would be great if csvstack would auto-detect the compression, i.e. stream the files into a decompressor and process the files as usual.

Would also make sense for other tools, as many csv files compress very good and other tools support compression transparently (e.g. duckdb), meaning such support would increase interoperability when exchanging such files.

Example usage:

csvstack *.csv.gz | zstd -o complte.csv.zst -f
jpmckinney commented 1 year ago

It does already autodetect the compression as long as the filenames end with .gz, .bz2 or .xz.

gsauthof commented 1 year ago

Ok, cool. Looks like I primarily tried csvstack on zstandard compressed files.

So how about adding zstandard support then (based on the .zst extension)?

jpmckinney commented 1 year ago

You would have to modify this method and then submit a pull request.

The Python standard library doesn't have support for zstandard, but I can add https://pypi.org/project/zstandard/ as an optional dependency.