markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

Heavy utilization on sda1 while running duperemove option=partial on sda2(,3,4,5,6) #306

Open janos666 opened 10 months ago

janos666 commented 10 months ago

So, I have my root filesystem on /dev/sda1 (an EFI partition with Btrfs) and my data filesystem on /dev/sda2,3,4,5,6 (RAID-5). I started this command (I know it's sort-of "insane" but I was experimenting since defaults resulted in zero deduplication of extents in a file that I know has a lot of zeros since it's a raw bit-by-bit DD image of an SSD which had a lot of free space TRIMd before I ran DD): duperemove -rdhb 4k --dedupe-options=nosame,partial /mnt/data/archives/backup/TFJ_laptop.img At first, it started normally. The HDDs in the RAID-5 filesystem looked like they do during sequential read. But then the utilization of HDDs dropped to ~idle (at least as reported by the nmon tool) and the utilization of the sda SSD looks like it's doing sequential write at full blast for 10+ minutes (still counting while I am typing this). However, "btrfs fi df /" doesn't report any increase in file size (and in case this write was leaving things behind instead of cyclicly overwriting relatively small blocks, the filesystem on the SSD would have run out of space a long time ago). The CPU utilization is around ~35% (~5% user + 30% system) with a lot of iowait time (the total u+s+iow is ~90%), so something is definitely happening.

The previous attempt "succeeded":

duperemove -rdhb 4k --dedupe-options=same /mnt/data/archives/backup/TFJ_laptop.img
Gathering file list...
Using 2 threads for file hashing phase
[1/1] (100.00%) csum: /mnt/data/archives/backup/TFJ_laptop.img
Loading only duplicated hashes from hashfile.
Found 8 identical extents.
Simple read and compare of file data found 3 instances of extents that might benefit from deduplication.
Showing 3 identical extents of length 128.0MB with id 3418087a
Start           Filename
12.7GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.4GB  "/mnt/data/archives/backup/TFJ_laptop.img"
14.5GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Showing 2 identical extents of length 256.0MB with id ae409078
Start           Filename
12.8GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.9GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Showing 3 identical extents of length 384.0MB with id 68c03b81
Start           Filename
13.0GB  "/mnt/data/archives/backup/TFJ_laptop.img"
13.5GB  "/mnt/data/archives/backup/TFJ_laptop.img"
14.2GB  "/mnt/data/archives/backup/TFJ_laptop.img"
Using 2 threads for dedupe phase
[0x7efdbc000b90] (1/3) Try to dedupe extents with id 3418087a
[0x5577229f0940] (2/3) Try to dedupe extents with id ae409078
[0x5577229f0940] Dedupe 1 extents (id: ae409078) with target: (12.8GB, 256.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
[0x7efdbc000b90] Dedupe 2 extents (id: 3418087a) with target: (12.7GB, 128.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
[0x5577229f0940] (3/3) Try to dedupe extents with id 68c03b81
[0x5577229f0940] Dedupe 2 extents (id: 68c03b81) with target: (13.0GB, 384.0MB), "/mnt/data/archives/backup/TFJ_laptop.img"
Comparison of extent info shows a net change in shared extents of: 0.0B
Total files scanned:  1
Total extent hashes scanned: 428

But this one didn't (I hit CTRL+C just now as I am typing this line: the SSD and CPU utilization dropped immediately):

duperemove -rdhb 4k --dedupe-options=nosame,partial /mnt/data/archives/backup/TFJ_laptop.img
Gathering file list...
Using 2 threads for file hashing phase
[1/1] (100.00%) csum: /mnt/data/archives/backup/TFJ_laptop.img
Loading only duplicated hashes from hashfile.
Found 8 identical extents.

The last time I used duperemove was years ago but I remember it used to work on suspected-to-be-greatly-dedupable (e.g. backup files or raw disk images). I am not sure if something eventually would have happened had I not canceled this run but it looked like it was misbehaving. Or does the "partial" option automatically create a hashfile on the root filesystem? But why only writes for so long though...?

Kernel: Linux-6.5.3-gentoo duperemove-0.12

filefrag /mnt/data/archives/backup/TFJ_laptop.img
/mnt/data/archives/backup/TFJ_laptop.img: 428 extents found

Also, running this on my root filesystem:

duperemove -rdhb 4k --dedupe-options=same /
Comparison of extent info shows a net change in shared extents of: 4.4GB

Resulted in ~3GB of "data / used" change in "btrfs fi df /" while the manual suggests the estimate could be lower than actual (I ran defrag -rt 32M on it to make sure nothing is deduped even though I didn't run duperemove on it for a long time).

fallenguru commented 10 months ago

duperemove's behaviour has changed massively between 0.11.1 on the one hand and 0.11.2 and later on the other; personally, I was unable to get "modern" duperemove to work as expected at all (it would either hang or barely deduplicate anything, or both), no matter the switches, so in the end I downgraded to 0.11.1 on all my boxes. That still works great. See also issue #301.

JackSlateur commented 10 months ago

Hello @janos666 Using nosame against a single file is useless by definition, this will never do anything

I suggest you use something like this: duperemove -rdhb 4k --dedupe-options=same,partial /mnt/data/archives/backup/TFJ_laptop.img

@fallenguru Yes, yet work has been done out that situation, and some are still in progress

janos666 commented 10 months ago

Hello @janos666 Using nosame against a single file is useless by definition, this will never do anything

I know, I was just curious what happens and it hung up instead of completing with no changes. But the strangest part was nmon reporting write utilization on the unrelated filesystem drive (could be false, I didn't check with other tools or at least checked the temperature of the SSD to see if it rises due to contact utilization - at least of the controller).

JackSlateur commented 10 months ago

But the strangest part was nmon reporting write utilization on the unrelated filesystem drive

That is indeed strange The only write that should happen is related to the database Unless you were running out of memory and it used some swap ?

Could you reproduce and, while the writes happens, run a strace -ftyp $PID (to catch some of those writes syscalls) as well as a ls -alht /proc/$PID/fd ?

janos666 commented 10 months ago

That is indeed strange The only write that should happen is related to the database Unless you were running out of memory and it used some swap ?

There was no swap attached at the time (that was the first thing I checked). [It's only attached with a script before running emerge --update @world and then it detaches it.]

Could you reproduce and, while the writes happens, run a strace -ftyp $PID (to catch some of those writes syscalls) as well as a ls -alht /proc/$PID/fd ?

I will see what I can do when I have time but I am not familiar with those tools (they aren't even installed).

JackSlateur commented 7 months ago

Hello @janos666 Had you the time to grab more data on this issue ? Could you check the latest version and tell me if somehow it fixed your issue ?

janos666 commented 7 months ago

I couldn't reproduce the issue. It's very possible it was something random. If I start the: duperemove -rdhb 4k --dedupe-options=nosame,partial /mnt/data/archives/backup/TFJ_laptop.img command now, I see very little disk utilization (both on the disks that contain this file - ~0 on the system SSD) with decent utilization on a single CPU thread but there is seemingly no progress after 10+ minutes (no other output after the initial line of [1/1] (100%) csum).

JackSlateur commented 7 months ago

How big is /mnt/data/archives/backup/TFJ_laptop.img ? The scan ends with a message like "Hashfile .. written"

janos666 commented 7 months ago

238GB (a simple DD copy of an SSD with TRIM and <=50% space used if I recall correctly). I didn't wait for it to finish because the low disk utilization suggested it might take "forever". That message didn't pop up. The system has 64GB of RAM (usually ~60GB is writeback or mostly page cache).

JackSlateur commented 7 months ago

Hum, I do not think it was stuck

I just processed a 220GB file with 4k block (with a ryzen 3600):

    Command being timed: "duperemove -hb4k --dedupe-options=nosame,partial vm-113-disk-1.original"
    User time (seconds): 360.90
    System time (seconds): 62.90
    Percent of CPU this job got: 68%
    Elapsed (wall clock) time (h:mm:ss or m:ss): 10:17.62
    Average shared text size (kbytes): 0
    Average unshared data size (kbytes): 0
    Average stack size (kbytes): 0
    Average total size (kbytes): 0
    Maximum resident set size (kbytes): 7326192
    Average resident set size (kbytes): 0
    Major (requiring I/O) page faults: 0
    Minor (reclaiming a frame) page faults: 1543936
    Voluntary context switches: 1631181
    Involuntary context switches: 34480
    Swaps: 0
    File system inputs: 449790744
    File system outputs: 6696
    Socket messages sent: 0
    Socket messages received: 0
    Signals delivered: 0
    Page size (bytes): 4096
    Exit status: 0

The progress bar is not fit for those large file, there is an issue about that which suggests some solutions

JackSlateur commented 7 months ago

For what it's worth, without page cache, on the same file:

janos666 commented 7 months ago

I left the time dupremove ... command running for 33 minutes with no progress. time cat ... took 7 minutes (Xeon E3-1230 v5 4/8).

JackSlateur commented 7 months ago

This patch will print some progress (for debug purposes only)