qarmin / czkawka

Multi functional app to find duplicates, empty folders, similar images etc.
Other
20.41k stars 663 forks source link

8.0.0 / 46 and 410 don't find duplicates properly Windows 10 #1403

Open rramstad833 opened 3 days ago

rramstad833 commented 3 days ago

Bug Description

When doing deduplication, czkawka does not find bit identical files between two file systems on Windows 10.

This seems to have been introduced in the last year or so. I'm a long term user of the program and recently "updated" from 5.1.0 which works fine.

I suspect there's something very wrong in the caching code, as that code has apparently been reworked several times recently from looking at the changelogs.

Steps to reproduce:

Take a folder with a lot of files in it. Make a copy of that folder. Start czkawka and point it at the two folders, with the original folder marked as reference. Deduplicate and delete all files found. Remove all empty directories.

Review and confirm that there are many files left in the copy, though there should be none, as everything was an exact duplicate. Run czkawka again to deduplicate, it will find a few more duplicates, which is clearly wrong. Delete those files.

Review and confirm there are still files left in the copy.

Use a terminal window and file comparison software to recursively compare the original folder with the copy folder, and verify that all of the remaining files in the copy are in fact exactly the same as the original.

Note that that final step proves it's not a file corruption issue -- the copy of the original folder is bit perfect, and using another tool like Cygwin + diff proves that the remaining files are exact copies of the source files -- czkawka should have picked them up.

Terminal output (optional):

<!--
Add terminal output only if needed - if there are some errors or warnings or you have performance/freeze issues.  
Very helpful in this situation will be logs from czkawka run with RUST_LOG environment variable set e.g. 
`RUST_LOG=debug ./czkawka` or `flatpak run --env=RUST_LOG=debug com.github.qarmin.czkawka` if you use flatpak, which will print more detailed info about executed function.
-->

<details>
<summary>Debug log</summary>

# UNCOMMENT DETAILS AND PUT LOGS HERE

</details>

System

rramstad833 commented 3 days ago

If you can tell me how to generate logs or whatnot, I'm happy to help with debugging.

For reference, we're talking about 100,000+ files and 300 GB or so in the original folder, so it's a decent amount of data.

I recognize that mine is somewhat a degenerate case i.e. the two folders are supposed to be exactly identical, and I'm simply proving that before I delete the folder copy. I'd say the program finds about 2/3rds of the files to be identical, not all, and the remaining 1/3rd can be proven identical using system utilities for comparison.