kilobyte / compsize

btrfs: find compression type/ratio on a file or set of files
Other
343 stars 23 forks source link

Exclusive / shared usage #26

Open dkacar-oradian opened 5 years ago

dkacar-oradian commented 5 years ago

Would it be possible to add a bit of code to show exclusive usage for subvolume? The only way to do that that I've found on the Internet seems to be to enable quota and then use btrfs qgroup show. However, when I enable quota btrfs becomes unusable for a long time (at least 15 minutes) after snapshot creation, so I can't afford to do that.

Right now I have something like this:

# btrfs subvolume list --sort=rootid -t /data/pg_data
ID      gen     top level       path    
--      ---     ---------       ----    
258     14791   5               mirrors/prod-db-c01_5432
328     14791   5               snapshots/bckash_6432

# btrfs-compsize /data/pg_data/mirrors/prod-db-c01_5432
Processed 14759 files, 18988196 regular extents (22223973 refs), 27 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       12%      203G         1.5T         1.5T       
none       100%       11G          11G          11G       
zstd        12%      192G         1.5T         1.5T

# btrfs-compsize /data/pg_data/snapshots/bckash_6432
Processed 6022 files, 10202853 regular extents (12257581 refs), 21 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       10%       95G         905G         879G       
none       100%       55M          55M          55M       
zstd        10%       95G         905G         879G

snapshots/bckash_6432 was created as a snapshot of mirrors/prod-db-c01_5432. Then a lot of files in the snapshot were deleted and a few were changed. So I'd like to know how much its exclusive usage is. If I call btrfs-compsize on both directories I get this:

# btrfs-compsize /data/pg_data/mirrors/prod-db-c01_5432 /data/pg_data/snapshots/bckash_6432 
Processed 20781 files, 19001055 regular extents (34492529 refs), 48 inline.
Type       Perc     Disk Usage   Uncompressed Referenced  
TOTAL       12%      203G         1.5T         2.3T       
none       100%       11G          11G          11G       
zstd        12%      192G         1.5T         2.3T

I don't see if it's possible to calculate the exclusive usage for snapshots/bckash_6432 from these numbers. So would it be possible to add a command line flag which would calculate usage only for files which are not reflinked? Then I could call btrfs-compsize without that flag on the mirror subvolume and after that call it with the flag on the snapshot subvolume. Any other method would be fine as well, this just looks as the simplest. But I don't quite understand how the code works, so I could be wrong.

kilobyte commented 5 years ago

"Not reflinked" isn't that simple -- a file reflinked within the same subvolume (eg. cp does that by default nowadays) should be included. Then, compsize cares about extents not files, and a file may have reflinked extents even within itself.

Thus, I wonder if an argument like -s $DIR that does set subtraction of extents would fit your use cases. That is: to get all extents that are included in the primary directory but not in the subtracted one.

dkacar-oradian commented 5 years ago

Well, I currently have a really simple use case. All of my subvolumes are Postgres data directories. The one called mirror is a Postgres slave which is in constant replication from the master. From time to time I create a snapshot (in the snapshot directory) from the mirror subvolume and start master Postgres database on the snapshot subvolume (on another port). So I don't have reflinks from cp. Data in the mirror subvolume is originally filled by rsync from the backup server and I don't know if rsync does something with reflinks. The more complex case would be creating snapshot2 subvolume from snapshot1 (which has been created from mirror). If compsize -s snapshot1 snapshot2 would show me only extents for which snapshot2 is an exclusive owner that would be great.

daviessm commented 4 years ago

@dkacar-oradian I did something along these lines using btrfs-python: https://github.com/daviessm/btrfs-snapshots-diff/blob/master/btrfs-subvol-size.py

If you cut out most of the printing lines the end result should be the answer to the question "how much space would I gain if I deleted these files in this subvolume?" - where "these files" could the the entire subvolume.

PS it's a bit slow for large subvolumes.