kilobyte / compsize

btrfs: find compression type/ratio on a file or set of files
Other
344 stars 23 forks source link

File sizes / output has massivly changed with latest version #21

Closed disaster123 closed 6 years ago

disaster123 commented 6 years ago

I was running caed4fdcd888467ddabd1b964b7737c0a0b050c8 for a long time. I updated to latest version but values have massivly changed: caed4fdcd888467ddabd1b964b7737c0a0b050c8:

Processed 62 files.
Type       Perc     Disk Usage   Uncompressed Referenced  
Data        69%       72G         104G         141G       
none       100%       37G          37G          46G       
zstd        52%       34G          66G          95G       

Current HEAD:

Processed 62 files, 1347823 regular extents (1622359 refs), 33 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       76%      179G         235G         273G       
none       100%      106G         106G         116G       
zstd        56%       73G         128G         157G       

What has happened?

ghost commented 6 years ago

Just guessing, but maybe this: https://github.com/kilobyte/compsize/commit/01ad3d03ba12d2ecd24e349be6f0ab5b905fb72d

disaster123 commented 6 years ago

Mhm that might be possible. I always thaught that btrfs-compsize shows me the exclusive extents and as ia have a lot of subvolumes the size is lower.

kilobyte commented 6 years ago

Judging from the size of your files, it's almost certainly this. It was a bug, sorry — the old version capped large files.

kilobyte commented 6 years ago

Closing; please reopen if you suspect anything else than that old bug.

disaster123 commented 6 years ago

@kilobyte ok thanks one last question would it be possible to skip extents already referenced by another file? This would make it possible to show total real size of multiple snapshots

I already thought this is the case after reading: "As it makes no sense to talk about compression ratio of a partial extent, every referenced extent is counted whole, exactly once -- no matter if you use only a few bytes of a 1GB extent or reflink it a thousand times. Thus, the uncompressed size will not match the number given by tar or du. On the other hand, the space used should be accurate (although obviously it can be shared with files outside our set)."

kilobyte commented 6 years ago

Yeah, this is already the case: for "Disk usage" and "Uncompressed", every extent is counted exactly once, no matter if it's referenced by many files/snapshots or just once. Do you think the documentation should be improved?

It's confusing because there are two cases: 1. an extent can be referenced multiple times, 2. a reference can use a part of the extent (but the whole extent is pinned, thus continues to take disk space).