Open danimesq opened 1 year ago
SnowFS supports copy-on-write for certain file systems like APFS, but it does not yet have deduplication implemented in the application layer. Currently, the main reason for this is performance, as fragmentation in binaries can have a higher impact on CPU and I/O. For the first implementation of SnowFS speed had a higher priority over disk space. However, we are considering adding this as an opt-in option, as these impacts may not be relevant for every project.
I'm here cheering for this to become an opt-in feature (personally ASAP but for y'all no pressure)
Could you share some background info? What type of projects would that be beneficial to? How many files, and what are the overall file sizes? Thanks!
@sebastianrath
What type of projects would that be beneficial to? How many files, and what are the overall file sizes? Thanks!
To have an idea, I have tons of GB of screenshots both on mobile and on desktop. And it is sad to know that most of the GB of these files have shared bytes that could be dedupliced.
Imagine a screenshot of a notepad, where most of its pixels are white; so all of that could be dedupliced (for example, Windows start menu icon on these screenshots wouldn't be repeated). I imagine GIFs and video file formats uses a similar approach for overlapping frames.
BTW I'm working at a new symlink daemon that will support to form a single file from shared objects. Its here: https://github.com/Floflis/witchlink
@sebastianrath do you know libraries that finds duplicate bytes on files and moves these duplicates into separate files?
I would love if git natively had more than 1 object per file, so there wouldn't be "foo", "bar" and "foobar" objects but only "foo" and "bar".
@sebastianrath I was expecting snow-fs already had this