sahib / rmlint

Extremely fast tool to remove duplicates and other lint from your filesystem
http://rmlint.rtfd.org
GNU General Public License v3.0
1.91k stars 132 forks source link

(Btrfs clone) No space saved, even though advertised? #539

Open rrueger opened 2 years ago

rrueger commented 2 years ago

I ran a rmlint -g -c sh:clone -o sh:rmlint.sh command, and was told there were 500GB of duplicated data.

When running sudo rmlint.sh -xr, it became clear that some (~20%) of the data was already reflinked. (I presume that rmlint counts this as duplicate data, but cannot free any data, because the files already share the same extents).

There were many rmlint --dedupe --dedupe-readonly calls that appeared to be successful (along with some failures).

Notably, there were at least 20GB of files that were successfully rmlint --dedupe'd.

However btrfs filesystem usage still reported the exact same amount of used/free space. Even after a reboot.

I ran rmlint against an entire subvolume, whose data is exclusive to that subvolume.

How do I understand this? I understand that it highly likely that I am not understanding some core behaviour of btrfs.

Thank you!


Version info

$ rmlint --version
version 2.10.1 compiled: Dec  3 2021 at [01:09:27] "Ludicrous Lemur" (rev unknown)
compiled with: +mounts +nonstripped +fiemap +sha512 +bigfiles +intl +replay +xattr +btrfs-support
$ uname -r
5.15.11-arch2-1
$ btrfs --version
btrfs-progs v5.15.1
cebtenzzre commented 2 years ago

Does this subvolume have any snapshots (btrfs subvolume list -s <fs root>)? Extents that are still referenced by snapshots will stay on disk. rmlint can deduplicate files within snapshots with -r, but in order to know about them but it needs to be given the path to the snapshot like any other directory.

rrueger commented 2 years ago

Thank you for your quick response.

Good point, rookie error on my behalf. There was another read-only snapshot $SNAP of the subvolume $SUB.

I reran rmlint -g -c sh:clone -o sh:rmlint.sh $SUB $SNAP and was told there were 1.3TB of duplicated data.

I then executed the rmlint.sh script with -r as root and observed (for me) unexpected behaviour

  1. Similarly to the first run against only $SUB, there were many successful rmlint --dedupe --dedupe-readonly calls and a hand full of failures. However, only ~1GB of data was freed.
  2. rmlint tried to clone files within $SUB. I would have expected that my first rmlint ... $SUB run would have cloned these files to each other. My understanding here is that once two files have been rmlint --dedupe'd, rmlint --is-reflink returns true?* In this case, rmlint ... $SUB $SNAP should only be cloning files within $SNAP or between $SUB and $SNAP.
  3. rmlint --dedupe --dedupe-readonly is very slow. According to glances it only reads from disk at about 50MB/s (on an SSD from which I regularly read at 500MB/s+ sustained, from which rmlint reads at 1.2GB/s during other stages of execution). I suspect this is entirely unrelated, but am mentioning anyway in case it tells you something about my disk failing or having other issues. Sorry if this turns out to be a complete red herring.

    Could it be that rmlint --dedupe --dedupe-readonly can only dedupe between two read-only subvolumes? (And not between a read-only, and a writeable subvolume)

    *I tried to test this hypothesis, with

    echo 123 > file cp file gile rmlint --dedupe file gile rmlint --is-reflink file gile

but was returned an exit code 5, i.e. fiemaps can't be read.


Here is my filesystem usage, perhaps something sticks out. I have rebalanced and rebooted since the rmlint runs.

# btrfs filesystem usage /btrfs 
Overall:
    Device size:           1.78TiB
    Device allocated:          1.49TiB
    Device unallocated:      292.97GiB
    Device missing:          0.00B
    Used:              1.47TiB
    Free (estimated):        315.15GiB  (min: 315.15GiB)
    Free (statfs, df):       315.15GiB
    Data ratio:               1.00
    Metadata ratio:           1.00
    Global reserve:      512.00MiB  (used: 0.00B)
    Multiple profiles:              no

Data,single: Size:1.48TiB, Used:1.46TiB (98.54%)
   /dev/mapper/computer-root       1.48TiB

Metadata,single: Size:11.00GiB, Used:7.69GiB (69.88%)
   /dev/mapper/computer-root      11.00GiB

System,single: Size:32.00MiB, Used:224.00KiB (0.68%)
   /dev/mapper/computer-root      32.00MiB

Unallocated:
   /dev/mapper/computer-root     292.97GiB