Open traverseda opened 5 years ago
Quoting Alex Davies (2019-08-17 22:28:22)
btrfs reports file size, not size on disk. The tool compsize can tell you the realize size on disk, how much is deduplicated (which is different from hardlinks because of copy-on-write?), and how much is compressed.
It would be nice if duc supported these btrfs-specific features.
I see no technical problems with this, although I guess it would make sense to make the feature not btrfs-specific, but make it map on any kind of compressing file system instead. At scan time Duc should be able to figure out the proper way to acquire the numbers from the specific fs type.
The only downside is that each file entry in the database would need an additional field to store the new size. This is probably the right time to add an optimization I wanted to implement for a long time: if Duc stores the real file size as an (var)int, we could store the block size and compressed size as relative numbers to the real size. That should shrink the DB a lot since the relative sizes are much smaller, and will result in smaller entries because of the varint encoding.
-- :wq ^X^Cy^K^X^C^C^C^C
I had some advice on the #btrfs irc channel today, and while technically this should be feasible, it is not easy or trivial. There is one single ioctl which is used to get this info (BTRFS_IOC_TREE_SEARCH_V2), but the resulting data needs to be properly cooked to get the required info. This will also require a lot of bookkeeping similar to hard-link accounting since btrfs might share the same extents for multiple files.
I'll leave this issue open, I might one day feel very bored and brave and pick this up.
That's more or less what I was expecting, thanks for taking the time to look into it.
btrfs reports file size, not size on disk. The tool compsize can tell you the realize size on disk, how much is deduplicated (which is different from hardlinks because of copy-on-write?), and how much is compressed.
It would be nice if duc supported these btrfs-specific features.