idrassi / DirHash

Windows command line utility to compute hash of directories and files
BSD 3-Clause "New" or "Revised" License
111 stars 11 forks source link

Add support to unicode in file names #6

Closed regystro closed 3 years ago

regystro commented 3 years ago

Hi. I'm using last version (1.12.0), and I've noticed that some file names containing unicode characters are wrongly encoded and fail during verify.

Command line: dirhash "testdir" -progress -sum -t testdir.blk3 -overwrite

E.g.: a file called "Music001 – Live.mp3" which contains an unicode "En dash" (0x2013) is encoded as "Music001 ? Live.mp3" and then the verify fails:

Using Blake3 to verify hash of "testdir" ...
Error: file "testdir\Music001 ? Live.mp3" not found in checksum file.

The same happens to this file name: "Půjdem spolu do Betléma - Czech Children's Songs.mp3" which uses this symbol.

Could you please add support for unicode characters?

Thank you.

idrassi commented 3 years ago

Thank you for reporting this issue.

The problem is caused by the fact that the checksum file created by DirHash was using ASCII encoding and so it could not store UNICODE file names and paths correctly. I'm in the process of fixing this but using UTF-8 encoding for the checksum file. I will also fix th display of UNICODE characters in Windows console because Windows console is not compatible with UNICODE by defaut.

idrassi commented 3 years ago

I have published version 1.13 that fixes this issue. Please feel free to reopen this issue if you still have problems.

regystro commented 3 years ago

Everything working flawlessly with 1.13. Thanks for the quick fix!