mhx / dwarfs

A fast high compression read-only file system for Linux, Windows and macOS
GNU General Public License v3.0
2.13k stars 56 forks source link

Feature: create an "hollow" filesystem #131

Open jbd opened 1 year ago

jbd commented 1 year ago

Hello,

I'm wondering how difficult it would be to build an "hollow" filesystem image using mkdwarfs ? By "hollow" I mean having all the metadata, but replace all the files with empty sparse one. It would allow to take a "light" image of a filesystem and to exploit it with classical tools like du, ncdu amongst others (knowing of course that files are now sparse).

I've glimpsed around the source code and it looks like it would be possible by leveraging the code modularity (writing a new file_scanner::impl ?). Maybe I'm missing something that would render the implementation more complex than it looks ? I would like your advice before trying to hack around.

Thank you !

mhx commented 1 year ago

Hi @jbd & sorry for the late response.

It unfortunately wouldn't be quite as simple as swapping out the file_scanner. There are a few more components downstream that would try to access the files and then produce actual file system data.

There are many possible approaches, and tbh I'm not entirely sure what the best way would be. I think the simplest way to achieve this would need an additional abstraction around mmap. Currently, mmap instances are created all over the place by path name and then used through the mmif interface. You'd probably need a factory (+ interface) to pass around and create either "real" file-backed mmap instances or "fake" anonymous, zero-filled ones. That way, you'd still create a fully working file system, but the contents of all files would be null bytes. Deduplication and compression would ensure the actual "data" stored is small. This doesn't require any changes to the metadata or to the logic accessing files. Also, apart from the contents of the files, the file system would behave exactly as the "real" one.

jbd commented 1 year ago

Hi @mhx, no worries at all for the late response. Thank you for having taking the time to answer.

Your explanation are quite clear and the suggestion of using an additional abstraction around mmap. Using fake zero-filled mmap instance sounds elegant and quite simple. I may try this in the first place to validate and play with this "hollow" concept I have in mind.

I'll keep you in touch if I ever get to this stage with my rusty C++ ;) Feel free to close this issue.

Thank you again.