imsnif / diskonaut

Terminal disk space navigator 🔭
MIT License
2.49k stars 66 forks source link

Feature: Support for Filesystem Compression (e.g. NTFS, BTRFS, ...) #26

Closed dbramucci closed 4 years ago

dbramucci commented 4 years ago

When running diskonaut on a BTRFS filesystem with compression enabled, it shows the uncompressed space used by folders and files, not the actual disk-space used.

One folder of mine using 69.5G of storage, but if I delete this folder I would not regain 69.5G worth of disk space because that folder is being compressed instead I would only regain 50G of space which represents the actual space used on the disk.

The command [sudo compsize /path/to/folder] was able to identify the post-compression space used.

Rationale for feature:

If I am using this tool, I am likely trying to free space so that I may allocate a new file.

Suppose I want to download a 4GiB iso image. If I have a 4.5GiB zip archive and a 5GiB text-file, diskonaut would make it appear that deleting the text-file would let me download the iso with a GiB to spare. Unfortunately, with compression enabled, the compressible zip archive would still free up 4.5GiB while the highly compressible text-file may only free a 900MiB. At that point I would download the iso, run out of space and then have to reopen diskonaut to free ? more GiB (and hope that compression doesn't cause more trouble).

Design Questions

Relevant Filesystems

This Wikipedia table of filesystem capabilities shows that the following support compression.

Freaky commented 4 years ago

It appears to be doing the right thing:

https://github.com/imsnif/diskonaut/blob/f19a4f5a1d64349ab2bc28946295caaf3dc3c75e/src/state/files/file_or_folder.rs#L61

And indeed it's correctly reporting the size of compressed files on ZFS.

imsnif commented 4 years ago

@dbramucci - thank you very much for this very detailed issue!! And thanks @Freaky for weighing in.

My understanding was also that the blocks * 512 should solve this. So @dbramucci, do you think this is particular to BTRFS? Could there be another reason for this? Or?

dbramucci commented 4 years ago

@imsnif It might be particular to BTRFS or extent based filesystems in general. I'll have to try out NTFS later to test another compression supporting, block based filesystem.

Every accurate disk space utility I've seen for BTRFS so far requires sudo to run. Which indicates something special goes on with BTRFS.

Looking at the manpage for btrfs-filesystem under du (e.g. sudo btrfs filesystem du ~/Downloads) Shows that FIEMAP is used to compute the file sizes. This makes me think (and here, I'm out of my depth) that this has to do with BTRFS being an extent based filesystem and not a block based filesystem. That is, BTRFS doesn't keep a list of all fixed sized blocks used for a file but rather, uses a list of variable length intervals (called extents).

This seems particularly relevant given that FIEMAP appears to stand for FIle Extent MAP. Likewise, it doesn't use fixed sized blocks meaning the .blocks api exposed in Rust must be some form of a leaky abstraction.

Unfortunately, I don't understand more about what BTRFS does when forced to describe itself in terms of blocks instead of extents.

Freaky commented 4 years ago

https://btrfs.wiki.kernel.org/index.php/Compression#Why_does_not_du_report_the_compressed_size.3F

Why does not du report the compressed size?

Traditionally the UNIX/Linux filesystems did not support compression and there was no item in stat data structure allocated for a similar purpose. There's the file size, that denotes nominal file size independent of the actually allocated size on-disk. For that purpose, the stat.st_blocks item contains a value that corresponds to the number of blocks allocated, i.e. in case of sparse files. However, when a compression is involved, the actually allocated size may be smaller than nominal, although the file is not sparse.

There are utilities that determine sparseness of a file by comparing the nominal and block-allocated size, this behaviour might cause bugs if st_blocks contained the amount after compression.

Another issue with backward compatibility is that up to now st_blocks always contains the uncompressed number of blocks. It's unclear what would happen if there are files with mixed types of the value. The proposed solution is to add another special call for that (via ioctl), but this may be not the ideal solution.

There is a fiemap crate, in principle I could tie it into filesize, but it would be behind an off-by-default feature flag, because it's both complex and looks eyewateringly slow.

Added as issue #1.

imsnif commented 4 years ago

@dbramucci - it seems to me the discussion brought us to the issue @Freaky opened in filesize. I think we can close this and address it upstream... or am I forgetting/missing something?

dbramucci commented 4 years ago

Seems good to me, the only question remaining would be if a UI for virtual vs physical filesizes would be supported but I think that can be it's own issue and should be guided by some user stories.

imsnif commented 4 years ago

For sure. It sounds like an interesting feature, would be happy to look further into it when we know more and/or feel the need.