jeremitu / finddupe

Port of finddupe duplicate file detector for Windows by Matthias Wandel http://www.sentex.net/~mwandel/finddupe/
17 stars 4 forks source link

"-ref <filepat>" gets ignored and files in <filepat> get deleted anyway. #4

Open l1m3r opened 4 years ago

l1m3r commented 4 years ago

Stumbled over a serious bug in your/this finddupe version. The -ref feature does not work as expected and the supplied <filepat> isn't used as reference only. Instead duplicate files get deleted/hardlinked as if it were just another normal <filepat> without the -ref.

The original works as it's supused to. (See attached files for details. I restored the two used directories to identical conditions before each run.)

bug_in_64bit_finddupe.log original.log same_bug_in_32bit.log

l1m3r commented 4 years ago

this bug seems even more nefarious than I thought.

When I run finddupe without -ref on just one of the fritzing folders it finds 340 dupes instead of 10 and roughly +7000kBytes in data. -> the result differs from the one above and it looks like your -ref behaves somewhere in between the original one and no -ref at all.

see attached log for details.

dif_resultwithout-ref.txt

m-brandl commented 4 years ago

I noticed the same problem. Screwed me big time today...

thomas694 commented 4 years ago

this bug seems even more nefarious than I thought.

When I run finddupe without -ref on just one of the fritzing folders it finds 340 dupes instead of 10 and roughly +7000kBytes in data. -> the result differs from the one above and it looks like your -ref behaves somewhere in between the original one and no -ref at all.

see attached log for details.

dif_resultwithout-ref.txt

Can you elaborate what exactly is wrong with this test's result? Why do you expect only 10 files to be duplicates? As far as I can see, you run it on a source code archive where they put the same files in different locations/subfolders. I also get around 330 duplicates which is correct.

If you would have additionally used the option "-hardlink" (of course only possible on NTFS file systems), then hardlinks would have been created and the total number of files would have remained the same, so the same directory structures as before, but you would have saved some disk space. Or you use the option "-bat" (without del/hardlink) which creates a file with the commands. Afterwards you can compare the found matches with any file compare program and verify its correctness.

l1m3r commented 3 years ago

see attached log for details. dif_resultwithout-ref.txt

Can you elaborate what exactly is wrong with this test's result? Why do you expect only 10 files to be duplicates? As far as I can see, you run it on a source code archive where they put the same files in different locations/subfolders. I also get around 330 duplicates which is correct.

That specific "test" result may in itself be completely fine/okay. The problem is that finddupe64 -del -rdonly -ref "E:\Temporal\_bla\fritzing.0.9.3b.64.pc\**" "S:\TEMP\fritzing.0.9.3b.64.pc\**" deletes 10 files in E:\... which isn't 0 (which it always MUST be with -ref) or 340 which would be wrong but at least the correct amount if not for the -ref. -> 10 deleted files is somewhere between the correct number (0) and the number without -ref (340).

If you would have additionally used the option "-hardlink" (of course only possible on NTFS file systems), then hardlinks would have been created and the total number of files would have remained the same, so the same directory structures as before, but you would have saved some disk space. Or you use the option "-bat" (without del/hardlink) which creates a file with the commands. Afterwards you can compare the found matches with any file compare program and verify its correctness.

Those were only test runs on completely identical folders. I did those test runs after loosing files finddupe should never have deleted. I don't care about hardlinks or the -bat option in this case.

I basically have the same "Windows Tools" folder on several computers and didn't keep them in sync over the years, thus I wanted to remove all files from older versions of this folder which are still unchanged in my current one (so I'd only need to manually check the remaining files/sub-folders in the old folders). -> finddupe64 -del -rdonly -ref "<CURRENT>\**" "<OLD>\**" but your finddupe deleted files in <CURRENT> too.

thomas694 commented 3 years ago

Thanks for your answers and your use case example.

I think you took the executable from my fork which was actually quite the same like here, so I removed the original links to prevent future confusions. Please follow the link on top of my fork's readme. I just run two tests (absolute and relative pathes) with my own version: c:\###> finddupe -del -rdonly -ref "c:\###\originals\**" "c:\###\copies\**" c:\###> finddupe -del -rdonly -ref "originals\**" "copies\**" In both cases my version only deleted files from the copied folder.