markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
794 stars 78 forks source link

v0.11.2 seems to hang? #255

Closed wmertens closed 3 years ago

wmertens commented 3 years ago

When I run 0.11.2 with Linux 5.9.9, it seems to run ok but hangs after it completes:

$ sudo /nix/store/8k23yfyrz8ssganbxyls1ca08g9c3xnm-duperemove-0.11.2/bin/duperemove -rdhA -vvvv --hashfile=/duperemove-hashes.db /
[sudo] password for wmertens: 
Warning: The hash file format in Duperemove master branch is under development and may change.
If the changes are not backwards compatible, you will have to re-create your hash file.
Using 128K blocks
Using hash: murmur3
Gathering file list...
Mountpoint traversal disallowed: / 
Adding files from database for hashing.
Database record ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store", 6917241.590) differs from disk ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store", 6919921.590), update flagged.
Database record ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store", 6917242.590) differs from disk ("/home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store", 6919922.590), update flagged.
Using 8 threads for file hashing phase
[01/35] (02.86%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/GPUCache/index
[03/35] (08.57%) csum: /home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index
[02/35] (05.71%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/index
[04/35] (11.43%) csum: /home/wmertens/Projects/StratoSync/data/nacf/data.sqlite3
[05/35] (14.29%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/hjdoplcnndgiblooccencgcggcoihigg/000003.log
[07/35] (20.00%) csum: /home/wmertens/.config/google-chrome/GrShaderCache/GPUCache/index
[06/35] (17.14%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/immpkjjlgappgfkkfieppnmlhakdmaab/000003.log
[08/35] (22.86%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/File System/000/p/Paths/000003.log
[09/35] (25.71%) csum: /home/wmertens/.config/google-chrome/Default/Session Storage/MANIFEST-000001
[10/35] (28.57%) csum: /home/wmertens/.config/google-chrome/Default/Visited Links
[11/35] (31.43%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Cache/index
[12/35] (34.29%) csum: /home/wmertens/.config/Code/User/globalStorage/state.vscdb
[13/35] (37.14%) csum: /home/wmertens/.config/google-chrome/Default/Web Data
[14/35] (40.00%) csum: /home/wmertens/.config/google-chrome/Default/Login Data
[15/35] (42.86%) csum: /home/wmertens/.config/google-chrome/Safe Browsing/UrlMalBin.store
[16/35] (45.71%) csum: /home/wmertens/.config/google-chrome/Default/Network Action Predictor
[17/35] (48.57%) csum: /home/wmertens/.config/google-chrome/Default/Shortcuts
[18/35] (51.43%) csum: /home/wmertens/.config/google-chrome/Safe Browsing/UrlSoceng.store
[19/35] (54.29%) csum: /home/wmertens/.config/google-chrome/Default/Storage/ext/fahmaaghhglfmonjliepjlchgpgfmobi/def/GPUCache/data_1
[20/35] (57.14%) csum: /home/wmertens/.config/google-chrome/Default/Favicons
[21/35] (60.00%) csum: /duperemove-hashes.db
[22/35] (62.86%) csum: /home/wmertens/.config/google-chrome/ShaderCache/GPUCache/data_1
[23/35] (65.71%) csum: /home/wmertens/.config/Slack/GPUCache/data_1
[24/35] (68.57%) csum: /home/wmertens/.config/Code/GPUCache/data_1
[25/35] (71.43%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/data_1
[26/35] (74.29%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Index
[27/35] (77.14%) csum: /home/wmertens/.config/google-chrome/GrShaderCache/GPUCache/data_1
[28/35] (80.00%) csum: /home/wmertens/.cache/mesa_shader_cache/index
[29/35] (82.86%) csum: /home/wmertens/.config/Code/User/workspaceStorage/575e38ed227e5768c6243162dc51bc46/state.vscdb
[30/35] (85.71%) csum: /home/wmertens/.config/google-chrome/Default/Application Cache/Cache/data_1
[31/35] (88.57%) csum: /home/wmertens/.config/google-chrome/Default/Local Extension Settings/ghbmnnjooekpmoecnnnilnnbdlolhkhi/000003.log
[32/35] (91.43%) csum: /var/log/journal/5e0c4c668f1546ee8336b59a5c5e71c6/system.journal
[33/35] (94.29%) csum: /home/wmertens/.config/google-chrome/Default/Cookies
[34/35] (97.14%) csum: /var/log/journal/5e0c4c668f1546ee8336b59a5c5e71c6/user-1000.journal
[35/35] (100.00%) csum: /home/wmertens/.config/google-chrome/Default/History
^C

If I let it be, it takes 200% CPU for hours and I let it crawl up to 6.8GB RAM before I killed it.

strace -f shows this after a while, it looks like its looping, lots of similar numbers:

[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6690] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6689] <... pread64 resumed>"\n\3\312\0\213\1\377\0\3\341\17\351\17\322\17\273\17\216\17w\17`\17I\5Q\0172\17\33\t\267"..., 4096, 18714624) = 4096
[pid  6693] pread64(6,  <unfinished ...>
[pid  6692] ioctl(5, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6689] pwrite64(3, "\n\5W\0\225\1\225\0\17H\0171\f\255\1\225\79\4Z\3\347\4\237\10\10\10\222\nF\7\254"..., 4096, 45572096 <unfinished ...>
[pid  6693] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6692] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] ioctl(4, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] pread64(5,  <unfinished ...>
[pid  6689] <... pwrite64 resumed>)     = 4096
[pid  6690] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6689] pread64(3,  <unfinished ...>
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] pread64(4,  <unfinished ...>
[pid  6693] pread64(6,  <unfinished ...>
[pid  6689] <... pread64 resumed>"\n\2F\0\231\2\1\0\16b\4\22\r\357\3C\4n\r\330\r\301\4@\r\253\16\220\f=\r8"..., 4096, 38539264) = 4096
[pid  6692] ioctl(5, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6690] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6693] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6692] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6689] pread64(3,  <unfinished ...>
[pid  6692] pread64(5,  <unfinished ...>
[pid  6689] <... pread64 resumed>"\2\2\274\0m\1A\5\0\0(Y\4\240\4\205\4j\4O\0044\4\31\3\376\3\343\3\311\3\256"..., 4096, 33972224) = 4096
[pid  6693] ioctl(6, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6692] <... pread64 resumed>"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 131072, 131072) = 131072
[pid  6690] ioctl(4, FS_IOC_FIEMAP, {fm_start=262144, fm_length=18446744073709551615, fm_flags=0, fm_extent_count=292} <unfinished ...>
[pid  6689] pwrite64(3, "\n\f\305\0\205\1\243\2\f\256\r\357\f\227\f\200\fi\r \fR\f$\5;\f\r\v\366\v\337"..., 4096, 11030528 <unfinished ...>
[pid  6693] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0
[pid  6690] <... ioctl resumed> => {fm_flags=0, fm_mapped_extents=0, ...}) = 0

Anything I can do to debug?

lorddoskias commented 3 years ago

Does this happen if you run duperemove on any directory or only on this particular one? Can you provide the output of duperemove when run with the --debug option. Looking at the repeated pattern of fiemap calls it seems it's locked up in a loop that constantly calls do_fiemap. Also can you provide the output of the filefrag -v /home/wmertens/.config/google-chrome/Default/History command.

wmertens commented 3 years ago

@lorddoskias debug shows that it's running on several files, I captured duperemove-debug.txt

I then closed Chrome and then it was only looping on:

fiemap_iter: filename "/home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index" idx 0 return poff 158912512 loff 131072 len 131072 flags 0x2001
Fiemap file "/home/wmertens/.config/google-chrome/ShaderCache/GPUCache/index", start: 262144, count: 292
0 extents found
$ filefrag -v .config/google-chrome/ShaderCache/GPUCache/index 
Filesystem type is: 9123683e
File size of .config/google-chrome/ShaderCache/GPUCache/index is 262512 (65 blocks of 4096 bytes)
 ext:     logical_offset:        physical_offset: length:   expected: flags:
   0:        0..       0:   99448109..  99448109:      1:            
   1:       32..      63:      38797..     38828:     32:   99448141: last,shared
.config/google-chrome/ShaderCache/GPUCache/index: 2 extents found
lorddoskias commented 3 years ago

Is this an XFS or a btrfs filesystem?

wmertens commented 3 years ago

Btrfs. Fsck is fine.

lorddoskias commented 3 years ago

This is really odd, I'm not able to reproduce this:

./duperemove --debug --hashfile=ludnica.db -r mnt-test/
Detected 24 logical and 12 physical cpus (ht is on).
Warning: The hash file format in Duperemove master branch is under development and may change.
If the changes are not backwards compatible, you will have to re-create your hash file.
Using 128K blocks
Using hash: murmur3
Gathering file list...
Using 24 threads for file hashing phase
[1/2] (50.00%) csum: /home/nborisov/projects/kernel/duperemove/mnt-test/base-file
[2/2] (100.00%) csum: /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file
Got 2 extent for file
Fiemap file "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file", start: 0, count: 292
2 extents found
[0] logical: 0, physical: 13697024 length: 4194304
Got 2 extent for file
Fiemap file "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file", start: 0, count: 292
[1] logical: 5242880, physical: 17891328 length: 1048576
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file" idx 0 return poff 13697024 loff 0 len 4194304 flags 0x2000
2 extents found
[0] logical: 0, physical: 13631488 length: 4096
[1] logical: 131072, physical: 13697024 length: 131072
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file" idx 0 return poff 13631488 loff 0 len 4096 flags 0x0
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/ref-file" idx 1 return poff 13697024 loff 131072 len 131072 flags 0x2001
Csumed file /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file(262144) in 0.000236 (1059.322034 mb/s)
db: write 2 hashes for file /home/nborisov/projects/kernel/duperemove/mnt-test/ref-file
fiemap_iter: filename "/home/nborisov/projects/kernel/duperemove/mnt-test/base-file" idx 1 return poff 17891328 loff 5242880 len 1048576 flags 0x1
Csumed file /home/nborisov/projects/kernel/duperemove/mnt-test/base-file(6291456) in 0.003643 (1646.994236 mb/s)
db: write 2 hashes for file /home/nborisov/projects/kernel/duperemove/mnt-test/base-file
Total files:  2
Total extent hashes: 4
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Duperemove memory usage statistics:
struct file_block num: 0 sizeof: 88 total: 0 max: 41 max total: 3608
struct dupe_blocks_list num: 0 sizeof: 112 total: 0 max: 1 max total: 112
struct dupe_extents num: 0 sizeof: 112 total: 0 max: 0 max total: 0
struct extent num: 0 sizeof: 144 total: 0 max: 0 max total: 0
struct filerec num: 0 sizeof: 192 total: 0 max: 2 max total: 384
struct filerec_token num: 0 sizeof: 32 total: 0 max: 0 max total: 0
struct file_hash_head num: 0 sizeof: 48 total: 0 max: 2 max total: 96
struct find_dupes_cmp num: 0 sizeof: 16 total: 0 max: 0 max total: 0
Sqlite3 used: 0  highwater: 226072

And this is also on a btrfs filesystem. Also the log you provide is indeed strange. Initially duperemove finds 2 extents (which is corroborated by your filefrag output)

[2/9] (22.22%) csum: /home/wmertens/.config/google-chrome/Default/GPUCache/index
Fiemap file "/home/wmertens/.config/google-chrome/Default/GPUCache/index", start: 0, count: 292 <--- this initializes the fiemap context
fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 0 return poff 361035153408 loff 0 len 4096 flags 0x0 <--- finds first extent
fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 1 return poff 158912512 loff 131072 len 131072 flags 0x2001 <--- finds second extent, this one has flags 0x2001 which translates to FIEMAP_EXTENT_SHARED/FIEMAP_EXTENT_LAST , again corroborated by your filefrag. 

Because FIEMAP_EXTENT_LAST is set then the loop in csum_by_extent should break, even the looping output suggests this flag is set:

fiemap_iter: filename "/home/wmertens/.config/google-chrome/Default/GPUCache/index" idx 0 return poff 158912512 loff 131072 len 131072 flags 0x2001
Fiemap file "/home/wmertens/.config/google-chrome/Default/GPUCache/index", start: 262144, count: 292
0 extents found

At this point I'm wondering if perhahps your database file is using the old, v2 format which hashes files in blocksizes and uses slightly different logic. Can you provide the output of the following command:

sqlite3 /duperemove-hashes.db "select * from config" . Another thing, depending on how comfortable you are with gdb you can try running duperemove with single hashing thread by adding the --io-threads=1 option and break when the hang occurs and provide a calltrace with gdb's bt command and provide the output.

lorddoskias commented 3 years ago

Ok I managed to reproduce it, working on a fix.

lorddoskias commented 3 years ago

Should be fixed by fd2011333c1a55173669d9440655b045af0a96a6 , please retest.

lorddoskias commented 3 years ago

Any feedback on the fix, I'd like to close the issue.

wmertens commented 3 years ago

Yes apologies, it's a bit of work to reproduce the build locally, I didn't get around to it yet. I'll do it asap.

Wout.

On Tue, Dec 8, 2020 at 12:54 PM lorddoskias notifications@github.com wrote:

Any feedback on the fix, I'd like to close the issue.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/markfasheh/duperemove/issues/255#issuecomment-740575114, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAANNFRCIV6BUQKACLOMBRLSTYHZFANCNFSM4UCUZ4NA .

wmertens commented 3 years ago

It works, thanks! :tada:

lorddoskias commented 3 years ago

Fixed in fd2011333c1a55173669d9440655b045af0a96a6