Open rrueger opened 2 years ago
Does this subvolume have any snapshots (btrfs subvolume list -s <fs root>
)? Extents that are still referenced by snapshots will stay on disk. rmlint can deduplicate files within snapshots with -r
, but in order to know about them but it needs to be given the path to the snapshot like any other directory.
Thank you for your quick response.
Good point, rookie error on my behalf. There was another read-only snapshot $SNAP
of the subvolume $SUB
.
I reran rmlint -g -c sh:clone -o sh:rmlint.sh $SUB $SNAP
and was told there were 1.3TB of duplicated data.
I then executed the rmlint.sh
script with -r
as root and observed (for me) unexpected behaviour
$SUB
, there were many successful rmlint --dedupe --dedupe-readonly
calls and a hand full of failures. However, only ~1GB of data was freed.rmlint
tried to clone files within $SUB
. I would have expected that my first rmlint ... $SUB
run would have cloned these files to each other. My understanding here is that once two files have been rmlint --dedupe
'd, rmlint --is-reflink
returns true?* In this case, rmlint ... $SUB $SNAP
should only be cloning files within $SNAP
or between $SUB
and $SNAP
.rmlint --dedupe --dedupe-readonly
is very slow. According to glances
it only reads from disk at about 50MB/s (on an SSD from which I regularly read at 500MB/s+ sustained, from which rmlint
reads at 1.2GB/s during other stages of execution). I suspect this is entirely unrelated, but am mentioning anyway in case it tells you something about my disk failing or having other issues. Sorry if this turns out to be a complete red herring.
Could it be that rmlint --dedupe --dedupe-readonly
can only dedupe between two read-only subvolumes? (And not between a read-only, and a writeable subvolume)
*I tried to test this hypothesis, with
echo 123 > file cp file gile rmlint --dedupe file gile rmlint --is-reflink file gile
but was returned an exit code 5
, i.e. fiemaps can't be read
.
Here is my filesystem usage, perhaps something sticks out. I have rebalanced and rebooted since the rmlint
runs.
# btrfs filesystem usage /btrfs
Overall:
Device size: 1.78TiB
Device allocated: 1.49TiB
Device unallocated: 292.97GiB
Device missing: 0.00B
Used: 1.47TiB
Free (estimated): 315.15GiB (min: 315.15GiB)
Free (statfs, df): 315.15GiB
Data ratio: 1.00
Metadata ratio: 1.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:1.48TiB, Used:1.46TiB (98.54%)
/dev/mapper/computer-root 1.48TiB
Metadata,single: Size:11.00GiB, Used:7.69GiB (69.88%)
/dev/mapper/computer-root 11.00GiB
System,single: Size:32.00MiB, Used:224.00KiB (0.68%)
/dev/mapper/computer-root 32.00MiB
Unallocated:
/dev/mapper/computer-root 292.97GiB
I ran a
rmlint -g -c sh:clone -o sh:rmlint.sh
command, and was told there were 500GB of duplicated data.When running
sudo rmlint.sh -xr
, it became clear that some (~20%) of the data was already reflinked. (I presume thatrmlint
counts this as duplicate data, but cannot free any data, because the files already share the same extents).There were many
rmlint --dedupe --dedupe-readonly
calls that appeared to be successful (along with some failures).Notably, there were at least 20GB of files that were successfully
rmlint --dedupe
'd.However
btrfs filesystem usage
still reported the exact same amount of used/free space. Even after a reboot.I ran
rmlint
against an entire subvolume, whose data is exclusive to that subvolume.How do I understand this? I understand that it highly likely that I am not understanding some core behaviour of btrfs.
Thank you!
Version info