Open tmbdev opened 1 month ago
These would benefit from a generic interface to plugin the compression algorithms. And how about Zstd?
Where is it missing at the moment? If you look at https://docs.ray.io/en/latest/data/loading-data.html#handling-compressed-files and https://arrow.apache.org/docs/python/generated/pyarrow.CompressedInputStream.html, it has both lz4 and zstd there :)
Description
Ray Data right now supports gzip for compression of shards.
It would be nice if it also supported lz4. While lz4 gives lower compression ratios, it is several times faster than gz for text compression/decompression.
Use case
I'm trying to maximize speed for I/O of very large text datasets.