Heavy utilization on sda1 while running duperemove option=partial on sda2(,3,4,5,6)

janos666 commented 10 months ago

So, I have my root filesystem on /dev/sda1 (an EFI partition with Btrfs) and my data filesystem on /dev/sda2,3,4,5,6 (RAID-5). I started this command (I know it's sort-of "insane" but I was experimenting since defaults resulted in zero deduplication of extents in a file that I know has a lot of zeros since it's a raw bit-by-bit DD image of an SSD which had a lot of free space TRIMd before I ran DD): duperemove -rdhb 4k --dedupe-options=nosame,partial /mnt/data/archives/backup/TFJ_laptop.img At first, it started normally. The HDDs in the RAID-5 filesystem looked like they do during sequential read. But then the utilization of HDDs dropped to ~idle (at least as reported by the nmon tool) and the utilization of the sda SSD looks like it's doing sequential write at full blast for 10+ minutes (still counting while I am typing this). However, "btrfs fi df /" doesn't report any increase in file size (and in case this write was leaving things behind instead of cyclicly overwriting relatively small blocks, the filesystem on the SSD would have run out of space a long time ago). The CPU utilization is around ~35% (~5% user + 30% system) with a lot of iowait time (the total u+s+iow is ~90%), so something is definitely happening.

The previous attempt "succeeded":

duperemove -rdhb 4k --dedupe-options=same /mnt/data/archives/backup/TFJ_laptop.img
Gathering file list...
Using 2 threads for file hashing phase
[1/1] (100.00%) csum: /mnt/data/archives/backup/TFJ_laptop.img
Loading only duplicated hashes from hashfile.
Found 8 identical extents.
Simple read and compare of file data found 3 instances of extents that might benefit from deduplication.
Showing 3 identical extents of length 128.0MB with id 3418087a
Start           Filename
12.7GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.4GB  "/mnt/data/archives/backup/TFJ_laptop.img"
14.5GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Showing 2 identical extents of length 256.0MB with id ae409078
Start           Filename
12.8GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.9GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Showing 3 identical extents of length 384.0MB with id 68c03b81
Start           Filename
13.0GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.5GB  "/mnt/data/archives/backup/TFJ_laptop.img"
14.2GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Using 2 threads for dedupe phase
[0x7efdbc000b90] (1/3) Try to dedupe extents with id 3418087a
[0x5577229f0940] (2/3) Try to dedupe extents with id ae409078
[0x5577229f0940] Dedupe 1 extents (id: ae409078) with target: (12.8GB, 256.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
[0x7efdbc000b90] Dedupe 2 extents (id: 3418087a) with target: (12.7GB, 128.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
[0x5577229f0940] (3/3) Try to dedupe extents with id 68c03b81
[0x5577229f0940] Dedupe 2 extents (id: 68c03b81) with target: (13.0GB, 384.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
Comparison of extent info shows a net change in shared extents of: 0.0B
Total files scanned:  1
Total extent hashes scanned: 428

But this one didn't (I hit CTRL+C just now as I am typing this line: the SSD and CPU utilization dropped immediately):

duperemove -rdhb 4k --dedupe-options=nosame,partial /mnt/data/archives/backup/TFJ_laptop.img
Gathering file list...
Using 2 threads for file hashing phase
[1/1] (100.00%) csum: /mnt/data/archives/backup/TFJ_laptop.img
Loading only duplicated hashes from hashfile.
Found 8 identical extents.

The last time I used duperemove was years ago but I remember it used to work on suspected-to-be-greatly-dedupable (e.g. backup files or raw disk images). I am not sure if something eventually would have happened had I not canceled this run but it looked like it was misbehaving. Or does the "partial" option automatically create a hashfile on the root filesystem? But why only writes for so long though...?

Kernel: Linux-6.5.3-gentoo duperemove-0.12

filefrag /mnt/data/archives/backup/TFJ_laptop.img
/mnt/data/archives/backup/TFJ_laptop.img: 428 extents found

Also, running this on my root filesystem:

duperemove -rdhb 4k --dedupe-options=same /
Comparison of extent info shows a net change in shared extents of: 4.4GB

Resulted in ~3GB of "data / used" change in "btrfs fi df /" while the manual suggests the estimate could be lower than actual (I ran defrag -rt 32M on it to make sure nothing is deduped even though I didn't run duperemove on it for a long time).

fallenguru commented 10 months ago

duperemove's behaviour has changed massively between 0.11.1 on the one hand and 0.11.2 and later on the other; personally, I was unable to get "modern" duperemove to work as expected at all (it would either hang or barely deduplicate anything, or both), no matter the switches, so in the end I downgraded to 0.11.1 on all my boxes. That still works great. See also issue #301.

JackSlateur commented 10 months ago

Hello @janos666 Using nosame against a single file is useless by definition, this will never do anything

I suggest you use something like this: duperemove -rdhb 4k --dedupe-options=same,partial /mnt/data/archives/backup/TFJ_laptop.img

@fallenguru Yes, yet work has been done out that situation, and some are still in progress

janos666 commented 10 months ago

Hello @janos666 Using nosame against a single file is useless by definition, this will never do anything

I know, I was just curious what happens and it hung up instead of completing with no changes. But the strangest part was nmon reporting write utilization on the unrelated filesystem drive (could be false, I didn't check with other tools or at least checked the temperature of the SSD to see if it rises due to contact utilization - at least of the controller).

JackSlateur commented 10 months ago

But the strangest part was nmon reporting write utilization on the unrelated filesystem drive

That is indeed strange The only write that should happen is related to the database Unless you were running out of memory and it used some swap ?

Could you reproduce and, while the writes happens, run a strace -ftyp $PID (to catch some of those writes syscalls) as well as a ls -alht /proc/$PID/fd ?

janos666 commented 10 months ago

That is indeed strange The only write that should happen is related to the database Unless you were running out of memory and it used some swap ?

There was no swap attached at the time (that was the first thing I checked). [It's only attached with a script before running emerge --update @world and then it detaches it.]

Could you reproduce and, while the writes happens, run a strace -ftyp $PID (to catch some of those writes syscalls) as well as a ls -alht /proc/$PID/fd ?

I will see what I can do when I have time but I am not familiar with those tools (they aren't even installed).