markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
816 stars 81 forks source link

Files not fully deduped ? #74

Closed matthiaskrgr closed 8 years ago

matthiaskrgr commented 9 years ago

Opening an report so this isn't forgotten (we talked on irc/#btrfs about this shortly): Running duperemove on the same dir twice might re-dedupe files, apparently. I think I also saw files being deduped on one run, that were not touched by a preceding run, some while ago, but I can't confirm that right now.

First run:

Search completed with no errors.             
Simple read and compare of file data found 42 instances of extents that might benefit from deduplication.
Start       Length      Filename (2 extents)
128.0K  748.0   "/lib/wine/fakedlls/winhlp32.exe"
128.0K  748.0   "/lib64/wine/fakedlls/winhlp32.exe"
Start       Length      Filename (2 extents)
128.0K  836.0   "/lib/wine/fakedlls/oledlg.dll"
128.0K  836.0   "/lib64/wine/fakedlls/oledlg.dll"
Start       Length      Filename (2 extents)
128.0K  2.6K    "/lib/wine/fakedlls/ieframe.dll"
128.0K  2.6K    "/lib64/wine/fakedlls/ieframe.dll"
Start       Length      Filename (2 extents)
128.0K  5.2K    "/lib/wine/fakedlls/winemine.exe"
128.0K  5.2K    "/lib64/wine/fakedlls/winemine.exe"
Start       Length      Filename (2 extents)
128.0K  26.2K   "/lib/wine/fakedlls/wineconsole.exe"
128.0K  26.2K   "/lib64/wine/fakedlls/wineconsole.exe"
Start       Length      Filename (2 extents)
128.0K  45.9K   "/lib/wine/fakedlls/comctl32.dll"
128.0K  45.9K   "/lib64/wine/fakedlls/comctl32.dll"
Start       Length      Filename (2 extents)
128.0K  46.1K   "/lib/wine/fakedlls/notepad.exe"
128.0K  46.1K   "/lib64/wine/fakedlls/notepad.exe"
Start       Length      Filename (2 extents)
128.0K  62.4K   "/lib/wine/fakedlls/shdoclc.dll"
128.0K  62.4K   "/lib64/wine/fakedlls/shdoclc.dll"
Start       Length      Filename (2 extents)
128.0K  69.3K   "/lib/wine/fakedlls/progman.exe"
128.0K  69.3K   "/lib64/wine/fakedlls/progman.exe"
Start       Length      Filename (2 extents)
128.0K  81.5K   "/lib/wine/fakedlls/winefile.exe"
128.0K  81.5K   "/lib64/wine/fakedlls/winefile.exe"
Start       Length      Filename (2 extents)
128.0K  94.7K   "/lib/wine/fakedlls/user32.dll"
128.0K  94.7K   "/lib64/wine/fakedlls/user32.dll"
Start       Length      Filename (2 extents)
128.0K  96.2K   "/lib/wine/fakedlls/appwiz.cpl"
128.0K  96.2K   "/lib64/wine/fakedlls/appwiz.cpl"
Start       Length      Filename (2 extents)
128.0K  115.0K  "/lib/wine/fakedlls/oleview.exe"
128.0K  115.0K  "/lib64/wine/fakedlls/oleview.exe"
Start       Length      Filename (2 extents)
128.0K  117.3K  "/lib/wine/fakedlls/crypt32.dll"
128.0K  117.3K  "/lib64/wine/fakedlls/crypt32.dll"
Start       Length      Filename (2 extents)
128.0K  117.6K  "/lib/wine/fakedlls/cards.dll"
128.0K  117.6K  "/lib64/wine/fakedlls/cards.dll"
Start       Length      Filename (6 extents)
128.0K  128.0K  "/lib/firmware/intel/fw_sst_22a8.bin"
18.8M   128.0K  "/lib64/llvm/libLLVM-3.5.so"
13.8M   128.0K  "/lib64/libgs.so.9.16"
512.0K  128.0K  "/lib64/libraw.so.10.0.0"
512.0K  128.0K  "/lib64/libraw_r.so.10.0.0"
1.6M    128.0K  "/lib64/wine/user32.dll.so"
Start       Length      Filename (2 extents)
0.0 140.3K  "/lib64/python2.7/site-packages/numpy/ma/tests/test_core.py"
0.0 140.3K  "/lib64/python3.4/site-packages/numpy/ma/tests/test_core.py"
Start       Length      Filename (2 extents)
128.0K  143.3K  "/lib/wine/fakedlls/inetcpl.cpl"
128.0K  143.3K  "/lib64/wine/fakedlls/inetcpl.cpl"
Start       Length      Filename (2 extents)
128.0K  169.1K  "/lib/wine/fakedlls/wordpad.exe"
128.0K  169.1K  "/lib64/wine/fakedlls/wordpad.exe"
Start       Length      Filename (2 extents)
0.0 171.5K  "/lib64/python2.7/site-packages/numpy/core/tests/test_multiarray.py"
0.0 171.5K  "/lib64/python3.4/site-packages/numpy/core/tests/test_multiarray.py"
Start       Length      Filename (2 extents)
0.0 213.7K  "/lib64/python2.7/site-packages/numpy/add_newdocs.py"
0.0 213.7K  "/lib64/python3.4/site-packages/numpy/add_newdocs.py"
Start       Length      Filename (2 extents)
128.0K  224.8K  "/lib/wine/fakedlls/regedit.exe"
128.0K  224.8K  "/lib64/wine/fakedlls/regedit.exe"
Start       Length      Filename (2 extents)
0.0 231.3K  "/lib64/python2.7/site-packages/numpy/ma/core.py"
0.0 231.3K  "/lib64/python3.4/site-packages/numpy/ma/core.py"
Start       Length      Filename (2 extents)
128.0K  256.0K  "/lib/wine/fakedlls/winmm.dll"
128.0K  256.0K  "/lib64/wine/fakedlls/winmm.dll"
Start       Length      Filename (2 extents)
0.0 269.7K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.dep"
0.0 269.7K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.dep"
Start       Length      Filename (2 extents)
0.0 301.2K  "/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem"
0.0 301.2K  "/lib/python3.4/site-packages/pip/_vendor/requests/cacert.pem"
Start       Length      Filename (2 extents)
0.0 307.8K  "/lib/python2.7/site-packages/pip/_vendor/certifi/cacert.pem"
0.0 307.8K  "/lib/python3.4/site-packages/pip/_vendor/certifi/cacert.pem"
Start       Length      Filename (2 extents)
0.0 318.6K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.symbols"
0.0 318.6K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.symbols"
Start       Length      Filename (2 extents)
0.0 384.7K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.dep.bin"
0.0 384.7K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.dep.bin"
Start       Length      Filename (2 extents)
0.0 394.9K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.symbols.bin"
0.0 394.9K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.symbols.bin"
Start       Length      Filename (2 extents)
128.0K  444.1K  "/lib/wine/fakedlls/taskmgr.exe"
128.0K  444.1K  "/lib64/wine/fakedlls/taskmgr.exe"
Start       Length      Filename (2 extents)
128.0K  470.3K  "/lib/wine/fakedlls/comdlg32.dll"
128.0K  470.3K  "/lib64/wine/fakedlls/comdlg32.dll"
Start       Length      Filename (2 extents)
128.0K  578.6K  "/lib/wine/fakedlls/cmd.exe"
128.0K  578.6K  "/lib64/wine/fakedlls/cmd.exe"
Start       Length      Filename (2 extents)
128.0K  659.0K  "/lib/wine/fakedlls/cryptui.dll"
128.0K  659.0K  "/lib64/wine/fakedlls/cryptui.dll"
Start       Length      Filename (2 extents)
128.0K  681.6K  "/lib/wine/fakedlls/winecfg.exe"
128.0K  681.6K  "/lib64/wine/fakedlls/winecfg.exe"
Start       Length      Filename (2 extents)
128.0K  768.0K  "/lib64/libjavascriptcoregtk-3.0.so.0.16.17"
128.0K  768.0K  "/lib64/libjavascriptcoregtk-1.0.so.0.16.17"
Start       Length      Filename (2 extents)
0.0 782.6K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.alias.bin"
0.0 782.6K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.alias.bin"
Start       Length      Filename (2 extents)
0.0 800.3K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.alias"
0.0 800.3K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.alias"
Start       Length      Filename (2 extents)
256.0K  1.4M    "/lib/wine/fakedlls/shell32.dll"
256.0K  1.4M    "/lib64/wine/fakedlls/shell32.dll"
Start       Length      Filename (2 extents)
128.0K  1.5M    "/lib/wine/fakedlls/kernel32.dll"
128.0K  1.5M    "/lib64/wine/fakedlls/kernel32.dll"
Start       Length      Filename (4 extents)
16.4M   1.5M    "/lib64/libpinyin/data/bigram.db"
640.0K  1.5M    "/lib64/libbrasero-burn3.so.1.2.6"
128.0K  1.5M    "/lib64/vlc/plugins/access/librar_plugin.so"
256.0K  1.5M    "/lib64/libQt5Sql.so.5.4.2"
Start       Length      Filename (2 extents)
1.0M    6.1M    "/lib64/libjavascriptcoregtk-3.0.so.0.16.17"
1.0M    6.1M    "/lib64/libjavascriptcoregtk-1.0.so.0.16.17"
Using 4 threads for dedupe phase
[0xb35990] Dedupe 1 extents with target: (128.0K, 748.0), "/lib/wine/fakedlls/winhlp32.exe"
[0xb359e0] Dedupe 1 extents with target: (128.0K, 836.0), "/lib/wine/fakedlls/oledlg.dll"
[0xb35940] Dedupe 1 extents with target: (128.0K, 5.2K), "/lib/wine/fakedlls/winemine.exe"
[0xb358f0] Dedupe 1 extents with target: (128.0K, 2.6K), "/lib/wine/fakedlls/ieframe.dll"
[0xb35990] Dedupe 1 extents with target: (128.0K, 45.9K), "/lib/wine/fakedlls/comctl32.dll"
[0xb359e0] Dedupe 1 extents with target: (128.0K, 46.1K), "/lib/wine/fakedlls/notepad.exe"
[0xb35940] Dedupe 1 extents with target: (128.0K, 69.3K), "/lib/wine/fakedlls/progman.exe"
[0xb358f0] Dedupe 1 extents with target: (128.0K, 62.4K), "/lib/wine/fakedlls/shdoclc.dll"
[0xb359e0] Dedupe 1 extents with target: (128.0K, 94.7K), "/lib/wine/fakedlls/user32.dll"
[0xb35990] Dedupe 1 extents with target: (128.0K, 81.5K), "/lib/wine/fakedlls/winefile.exe"
[0xb358f0] Dedupe 1 extents with target: (128.0K, 115.0K), "/lib/wine/fakedlls/oleview.exe"
[0xb35940] Dedupe 1 extents with target: (128.0K, 96.2K), "/lib/wine/fakedlls/appwiz.cpl"
[0xb359e0] Dedupe 1 extents with target: (128.0K, 117.3K), "/lib/wine/fakedlls/crypt32.dll"
[0xb35990] Dedupe 1 extents with target: (128.0K, 117.6K), "/lib/wine/fakedlls/cards.dll"
[0xb35940] Dedupe 1 extents with target: (128.0K, 128.0K), "/lib/firmware/intel/fw_sst_22a8.bin"
[0xb359e0] Dedupe 2 extents with target: (16.4M, 1.5M), "/lib64/libpinyin/data/bigram.db"
Kernel processed data (excludes target files): 3.9M
Comparison of extent info shows a net change in shared extents of: 0.0

second run:

[########################################]
Search completed with no errors.             
Simple read and compare of file data found 42 instances of extents that might benefit from deduplication.
Start       Length      Filename (2 extents)
128.0K  748.0   "/lib/wine/fakedlls/winhlp32.exe"
128.0K  748.0   "/lib64/wine/fakedlls/winhlp32.exe"
Start       Length      Filename (2 extents)
128.0K  836.0   "/lib/wine/fakedlls/oledlg.dll"
128.0K  836.0   "/lib64/wine/fakedlls/oledlg.dll"
Start       Length      Filename (2 extents)
128.0K  2.6K    "/lib/wine/fakedlls/ieframe.dll"
128.0K  2.6K    "/lib64/wine/fakedlls/ieframe.dll"
Start       Length      Filename (2 extents)
128.0K  5.2K    "/lib/wine/fakedlls/winemine.exe"
128.0K  5.2K    "/lib64/wine/fakedlls/winemine.exe"
Start       Length      Filename (2 extents)
128.0K  26.2K   "/lib/wine/fakedlls/wineconsole.exe"
128.0K  26.2K   "/lib64/wine/fakedlls/wineconsole.exe"
Start       Length      Filename (2 extents)
128.0K  45.9K   "/lib/wine/fakedlls/comctl32.dll"
128.0K  45.9K   "/lib64/wine/fakedlls/comctl32.dll"
Start       Length      Filename (2 extents)
128.0K  46.1K   "/lib/wine/fakedlls/notepad.exe"
128.0K  46.1K   "/lib64/wine/fakedlls/notepad.exe"
Start       Length      Filename (2 extents)
128.0K  62.4K   "/lib/wine/fakedlls/shdoclc.dll"
128.0K  62.4K   "/lib64/wine/fakedlls/shdoclc.dll"
Start       Length      Filename (2 extents)
128.0K  69.3K   "/lib/wine/fakedlls/progman.exe"
128.0K  69.3K   "/lib64/wine/fakedlls/progman.exe"
Start       Length      Filename (2 extents)
128.0K  81.5K   "/lib/wine/fakedlls/winefile.exe"
128.0K  81.5K   "/lib64/wine/fakedlls/winefile.exe"
Start       Length      Filename (2 extents)
128.0K  94.7K   "/lib/wine/fakedlls/user32.dll"
128.0K  94.7K   "/lib64/wine/fakedlls/user32.dll"
Start       Length      Filename (2 extents)
128.0K  96.2K   "/lib/wine/fakedlls/appwiz.cpl"
128.0K  96.2K   "/lib64/wine/fakedlls/appwiz.cpl"
Start       Length      Filename (2 extents)
128.0K  115.0K  "/lib/wine/fakedlls/oleview.exe"
128.0K  115.0K  "/lib64/wine/fakedlls/oleview.exe"
Start       Length      Filename (2 extents)
128.0K  117.3K  "/lib/wine/fakedlls/crypt32.dll"
128.0K  117.3K  "/lib64/wine/fakedlls/crypt32.dll"
Start       Length      Filename (2 extents)
128.0K  117.6K  "/lib/wine/fakedlls/cards.dll"
128.0K  117.6K  "/lib64/wine/fakedlls/cards.dll"
Start       Length      Filename (6 extents)
128.0K  128.0K  "/lib/firmware/intel/fw_sst_22a8.bin"
18.8M   128.0K  "/lib64/llvm/libLLVM-3.5.so"
13.8M   128.0K  "/lib64/libgs.so.9.16"
512.0K  128.0K  "/lib64/libraw_r.so.10.0.0"
512.0K  128.0K  "/lib64/libraw.so.10.0.0"
1.6M    128.0K  "/lib64/wine/user32.dll.so"
Start       Length      Filename (2 extents)
0.0 140.3K  "/lib64/python2.7/site-packages/numpy/ma/tests/test_core.py"
0.0 140.3K  "/lib64/python3.4/site-packages/numpy/ma/tests/test_core.py"
Start       Length      Filename (2 extents)
128.0K  143.3K  "/lib/wine/fakedlls/inetcpl.cpl"
128.0K  143.3K  "/lib64/wine/fakedlls/inetcpl.cpl"
Start       Length      Filename (2 extents)
128.0K  169.1K  "/lib/wine/fakedlls/wordpad.exe"
128.0K  169.1K  "/lib64/wine/fakedlls/wordpad.exe"
Start       Length      Filename (2 extents)
0.0 171.5K  "/lib64/python2.7/site-packages/numpy/core/tests/test_multiarray.py"
0.0 171.5K  "/lib64/python3.4/site-packages/numpy/core/tests/test_multiarray.py"
Start       Length      Filename (2 extents)
0.0 213.7K  "/lib64/python2.7/site-packages/numpy/add_newdocs.py"
0.0 213.7K  "/lib64/python3.4/site-packages/numpy/add_newdocs.py"
Start       Length      Filename (2 extents)
128.0K  224.8K  "/lib/wine/fakedlls/regedit.exe"
128.0K  224.8K  "/lib64/wine/fakedlls/regedit.exe"
Start       Length      Filename (2 extents)
0.0 231.3K  "/lib64/python2.7/site-packages/numpy/ma/core.py"
0.0 231.3K  "/lib64/python3.4/site-packages/numpy/ma/core.py"
Start       Length      Filename (2 extents)
128.0K  256.0K  "/lib/wine/fakedlls/winmm.dll"
128.0K  256.0K  "/lib64/wine/fakedlls/winmm.dll"
Start       Length      Filename (2 extents)
0.0 269.7K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.dep"
0.0 269.7K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.dep"
Start       Length      Filename (2 extents)
0.0 301.2K  "/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem"
0.0 301.2K  "/lib/python3.4/site-packages/pip/_vendor/requests/cacert.pem"
Start       Length      Filename (2 extents)
0.0 307.8K  "/lib/python2.7/site-packages/pip/_vendor/certifi/cacert.pem"
0.0 307.8K  "/lib/python3.4/site-packages/pip/_vendor/certifi/cacert.pem"
Start       Length      Filename (2 extents)
0.0 318.6K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.symbols"
0.0 318.6K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.symbols"
Start       Length      Filename (2 extents)
0.0 384.7K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.dep.bin"
0.0 384.7K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.dep.bin"
Start       Length      Filename (2 extents)
0.0 394.9K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.symbols.bin"
0.0 394.9K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.symbols.bin"
Start       Length      Filename (2 extents)
128.0K  444.1K  "/lib/wine/fakedlls/taskmgr.exe"
128.0K  444.1K  "/lib64/wine/fakedlls/taskmgr.exe"
Start       Length      Filename (2 extents)
128.0K  470.3K  "/lib/wine/fakedlls/comdlg32.dll"
128.0K  470.3K  "/lib64/wine/fakedlls/comdlg32.dll"
Start       Length      Filename (2 extents)
128.0K  578.6K  "/lib/wine/fakedlls/cmd.exe"
128.0K  578.6K  "/lib64/wine/fakedlls/cmd.exe"
Start       Length      Filename (2 extents)
128.0K  659.0K  "/lib/wine/fakedlls/cryptui.dll"
128.0K  659.0K  "/lib64/wine/fakedlls/cryptui.dll"
Start       Length      Filename (2 extents)
128.0K  681.6K  "/lib/wine/fakedlls/winecfg.exe"
128.0K  681.6K  "/lib64/wine/fakedlls/winecfg.exe"
Start       Length      Filename (2 extents)
128.0K  768.0K  "/lib64/libjavascriptcoregtk-3.0.so.0.16.17"
128.0K  768.0K  "/lib64/libjavascriptcoregtk-1.0.so.0.16.17"
Start       Length      Filename (2 extents)
0.0 782.6K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.alias.bin"
0.0 782.6K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.alias.bin"
Start       Length      Filename (2 extents)
0.0 800.3K  "/lib/modules/4.0.4-301.fc22.x86_64/modules.alias"
0.0 800.3K  "/lib/modules/4.0.4-303.fc22.x86_64/modules.alias"
Start       Length      Filename (2 extents)
256.0K  1.4M    "/lib/wine/fakedlls/shell32.dll"
256.0K  1.4M    "/lib64/wine/fakedlls/shell32.dll"
Start       Length      Filename (2 extents)
128.0K  1.5M    "/lib/wine/fakedlls/kernel32.dll"
128.0K  1.5M    "/lib64/wine/fakedlls/kernel32.dll"
Start       Length      Filename (4 extents)
16.4M   1.5M    "/lib64/libpinyin/data/bigram.db"
640.0K  1.5M    "/lib64/libbrasero-burn3.so.1.2.6"
128.0K  1.5M    "/lib64/vlc/plugins/access/librar_plugin.so"
256.0K  1.5M    "/lib64/libQt5Sql.so.5.4.2"
Start       Length      Filename (2 extents)
1.0M    6.1M    "/lib64/libjavascriptcoregtk-3.0.so.0.16.17"
1.0M    6.1M    "/lib64/libjavascriptcoregtk-1.0.so.0.16.17"
Using 4 threads for dedupe phase
[0x10ef9e0] Dedupe 1 extents with target: (128.0K, 836.0), "/lib/wine/fakedlls/oledlg.dll"
[0x10ef850] Dedupe 1 extents with target: (128.0K, 748.0), "/lib/wine/fakedlls/winhlp32.exe"
[0x10ef990] Dedupe 1 extents with target: (128.0K, 2.6K), "/lib/wine/fakedlls/ieframe.dll"
[0x10ef990] Dedupe 1 extents with target: (128.0K, 128.0K), "/lib/firmware/intel/fw_sst_22a8.bin"
[0x10ef990] Dedupe 1 extents with target: (16.4M, 1.5M), "/lib64/libpinyin/data/bigram.db"
Kernel processed data (excludes target files): 1.6M
Comparison of extent info shows a net change in shared extents of: 0.0
markfasheh commented 8 years ago

Going to close this - there were changes in the way we select blocks and extents to be sure we don't miss files - specifically the find-dupes algorithm got better at that and we also have block dedupe now which shouldn't miss files either.