idrassi / DirHash

Windows command line utility to compute hash of directories and files
BSD 3-Clause "New" or "Revised" License
111 stars 11 forks source link

Hash value mismatch #9

Closed bluelayer closed 3 years ago

bluelayer commented 3 years ago

hello world! 😄

since yesterday I'm getting some mismatch in my hash verification, about 8~ in 6900~list. "perfect, you found some problems" you say, but no. I found that hash sometimes change but only in DirHash 😮 I have some "old" hash (in blake3) files from other app and the file is good like in past... any idea?

idrassi commented 3 years ago

Does this issue happen only with "-threads" switch or does it happen also without it? I want to be sure if the issue is caused by multithreading or if it is general.

Also, if you hash on the file with DirHash, doesn't return the correct value? This will tell us if the issue is with Blake3 implementation in DirHash or with the implementation of "-sum" or "-verify" switches.

Is the incorrect value always the same or does it change?

I personally don't think there is an issue in Blake3 implementation in DirHash but it is possible that multithreading has an issue although I tested it a lot.

Thank you for your help.

bluelayer commented 3 years ago

-threads and -sum gives incorrect hash -threads and -verify seems ok

bluelayer commented 3 years ago

the problematic hash values made with -threads and -sum changes every time I run it in diversity and quantity of affected files.

idrassi commented 3 years ago

So since -verify works, the issue is not in Blake3 computation by rather the problem is that DirHash is writing wrong hash value to the SUM text file. Probably you will notice that the wrong hash value is the same as the hash of another file (e.g. two different files have same hash value).

I will check the code of the thread that is responsible for writing the SUM file. There must be subtle bug their that I didn't encounter in my tests.

Thank you for your tests and your patience. I will let you know my findings later today.

bluelayer commented 3 years ago

Probably you will notice that the wrong hash value is the same as the hash of another file

😮 you're right almost every error is together with the next file

idrassi commented 3 years ago

Thank you for the confirmation. I have fixed the issue (it was caused by usage of a global variable for converting hash to hexadecimal string) and I published version 1.16 that contains the fix. Thanks to your description, I immediately knew what was the problem and this helped me a lot.

Can you please confirm that the fix work for you?

bluelayer commented 3 years ago

12.6GiB in 5 seconds 🥇 0 fails 😆

idrassi commented 3 years ago

Thank you for the confirmation.

These are amazing speed numbers! It would be interesting to compare the performance of DirHash with other checksum applications. The implementation of multithreading in DirHash tries to be as optimal as possible but maybe there are better implementations out there.

bluelayer commented 3 years ago

OpenHashTab is the second in speed at the moment here, was the first before -threads

idrassi commented 3 years ago

Thanks. I didn't know about OpenHashTab. Maybe I should implement also a shell extension in DirHash to make it user friendly.