wesm / feather

Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
Apache License 2.0
2.75k stars 167 forks source link

Is it possible to save feather file in compressed zip format? #383

Closed andirey closed 4 years ago

andirey commented 4 years ago

I need to save feather data in a more compact format. After making several tests I found that after using the "zip" function the size of the "feather" data file reduce up to 90%.

So, there is a simple question - does anyway to save data as a "feather" file with let say argument like "compressed = zip" to save disk space.

library(feather) library(zip)

Write data in feather format

write_feather(df, "data.feather)

Write data in zip feather format

zip("data.feather.zip", "data.feather")

How to have commands like these ones without using disk space for temp file ?

write_feather(df, "data.feather.zip", format = "zip") df <- read_feather("data.feather.zip", format = "zip")

Thanks!

wesm commented 4 years ago

I'd suggest using Parquet files for this use case. We eventually will have some compression options with Feather but it's not a short term priority (and no one is paying for this work to be done)

andirey commented 4 years ago

@wesm Thanks for the advice. I consider "fst" package and format as the most feasible alternative for compressing files, and it also shows the faster speed of w/r operations. Pitty, because I love feather and used it many times.

wesm commented 4 years ago

I did my own investigations into this and found mixed results

https://ursalabs.org/blog/2019-10-columnar-perf/

wesm commented 4 years ago

We've implemented lz4 and zstd compression with "Feather V2" coming in the next Apache Arrow release

andirey commented 4 years ago

Wow! Is it possible to test now? How I can handle existed data in feather format with a new compressed one? Any changes in "feather_read/write" functions? Great news, thanks a lot!

wesm commented 4 years ago

You can install a nightly arrow build and try it out

https://github.com/apache/arrow/blob/master/r/README.md#installing-a-development-version