jasonwhite / rudolfs

A high-performance, caching Git LFS server with an AWS S3 and local storage back-end.
MIT License
377 stars 35 forks source link

[feature request] compression for local disk backend #32

Open greyltc opened 3 years ago

greyltc commented 3 years ago

Thanks very much for the local backend option, it's great!

It would be pretty neat if we could turn on compression (zstd maybe?) for the locally stored files.

jasonwhite commented 3 years ago

I think this is a good idea. There are lots of types of binary files that are horribly bloated (PDBs come to mind). I think transport-level compression would be good too. I believe the Git LFS client can accept gzip compression when downloading files (although this could be implemented via a reverse proxy like nginx).

I think this only makes sense for local disk storage and it probably makes sense to implement it at that level. I'm not quite sure how it would be done with S3. In order for compression to be transparent and backwards compatible, an extended file attribute could be used to mark the LFS object as compressed. Then, the local disk backend could decompress it on-the-fly. The tricky part is implementing this in such a way so that it is done before encryption (because encryption will reduce its compressibility) and after SHA256 hash validation is done.

This is a bit of work. I don't have too much time to work on this project these days, but I am always happy to take pull requests.

greyltc commented 5 months ago

Having a way to disable encryption and then letting a filesystem with transparent compression take care of this is a nice workaround to having the (de/)compression done by rudolfs as it writes/reads files to/from the disk (in the case where you can get away without encryption, anyway).