markfasheh / duperemove

Tools for deduping file systems
GNU General Public License v2.0
689 stars 75 forks source link

Infinite loop after hash calculations if `-d` is not specified (with removed files?) #321

Closed hhyyrylainen closed 8 months ago

hhyyrylainen commented 8 months ago

Sorry if this has already been fixed in the latest version but using version duperemove 0.12 I noticed that if I don't give the -d flag then after the file hashes are calculated it seems the same duplicates / no duplicates are detected in an infinite loop and duperemove never exits without me hitting CTRL-C

/home/hhyyrylainen/.local/share/NuGet/http-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_microsoft.playwright.1.38.0.dat: Skipping dedupe.
Error 2: No such file or directory while opening "/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.38.0/microsoft.playwright.1.38.0.nupkg" (write=1)
/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.38.0/microsoft.playwright.1.38.0.nupkg: Skipping dedupe.
Error 2: No such file or directory while opening "/home/hhyyrylainen/.local/share/NuGet/http-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_microsoft.playwright.1.38.0.dat" (write=1)
[0x55663adc33a0] (282188/282189) Try to dedupe extents with id c4df2bed
Error 2: No such file or directory while opening "/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.39.0/microsoft.playwright.1.39.0.nupkg" (write=1)
/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.39.0/microsoft.playwright.1.39.0.nupkg: Skipping dedupe.
Error 2: No such file or directory while opening "/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.39.0/microsoft.playwright.1.39.0.nupkg" (write=1)
[0x55663adc33a0] (282189/282189) Try to dedupe extents with id e7fc0bab
Error 2: No such file or directory while opening "/home/hhyyrylainen/.local/share/NuGet/http-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_microsoft.playwright.1.37.1.dat" (write=1)
/home/hhyyrylainen/.local/share/NuGet/http-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_microsoft.playwright.1.37.1.dat: Skipping dedupe.
Error 2: No such file or directory while opening "/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.37.1/microsoft.playwright.1.37.1.nupkg" (write=1)
/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.37.1/microsoft.playwright.1.37.1.nupkg: Skipping dedupe.
Error 2: No such file or directory while opening "/home/hhyyrylainen/.local/share/NuGet/http-cache/670c1461c29885f9aa22c281d8b7da90845b38e4$ps:_api.nuget.org_v3_index.json/nupkg_microsoft.playwright.1.37.1.dat" (write=1)
Error 2: No such file or directory while opening "/home/hhyyrylainen/.nuget/packages/microsoft.playwright/1.39.0/.playwright/node/win32_x64/node.exe" (write=1)
Comparison of extent info shows a net change in shared extents of: 546252469
Using 32 threads for file hashing phase
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Nothing to dedupe.
Using 32 threads for file hashing phase
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Nothing to dedupe.
Using 32 threads for file hashing phase
Loading only duplicated hashes from hashfile.
Found 0 identical extents.
Simple read and compare of file data found 0 instances of extents that might benefit from deduplication.
Nothing to dedupe.

Then those last 3 lines repeat infinitely. I may have caused an issue by pre-emptively noticing things in the duplicate output that I definitely didn't need and deleted them. On a previous run I noticed that the tool output a bunch of duplicates which it then repeatedly re-printed after a minute or so. So I'm opening this issue as I didn't find an existing one about this tool getting into an infinite loop at the end. I have one more big initial check running without -d, I'll report back here when that finishes whether that gets into an infinite loop or not.

Update: that other duperemove run I had on a disk where I didn't delete things completed just fine after printing the duplicates found just once.

JackSlateur commented 8 months ago

Hello @hhyyrylainen

I believe this bug has been fixed by 114c84bf40dca06c1bd257517d583fa9d9ab7f95 and bbc22672bef3c6a3739a4c3158ab7a069d372955

Files that could not be scanned for some reasons (like yours, which were deleted) used to create an infinite loop as you described

hhyyrylainen commented 8 months ago

So this should be fixed in version 0.13? That's great to hear. I see Fedora 38 is not going to get the update (https://bodhi.fedoraproject.org/updates/FEDORA-2023-16541f0464), but I should get the latest version then once 39 comes out soon. In the meantime I'll hold off on adding a cron job to run duperemove so that I can manually keep an eye on if things get stuck.