mingkun868 / crypto-js

Automatically exported from code.google.com/p/crypto-js
1 stars 0 forks source link

Most of the hashs wont works regarding file encoding #139

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
What steps will reproduce the problem?
1. Create a text file containing aÿþa
2. Convert it to AINSII/UTF8/UTF8-NOBOM/UTF16/.....
3. Check the hash returned and compare with a Checksum application.

What is the expected output? What do you see instead?
The CryptoJS lib force conversion to UTF8 on the input, it will return wrong 
hash...

What version of the product are you using? On what operating system?
Firefox 31 / 3.1.2

Please provide any additional information below.
How to fix it ? Simply don't use UTF8 encode on the input because you really 
don't need it, you would need UTF8 encode if you wanted to "SHOW" the content, 
but you really don't need to UTF8 encode the file to get a hash of it.... Or 
you will get a very wrong hash.

I attached a screenshot showing how to fix it for SHA3.js file, however, you 
will have the same issues in almost all if not all the others hashs 
implementation. I got the same issue with SHA-256.

For SHA3   : q to e
For SHA256 : l to k

Easy to realize if you look at my screenshot.

Note that is a "temp-fix", you may need the UTF8 somewhere else in the file for 
w/e reason.

Original issue reported on code.google.com by contact@sundark.eu on 3 Aug 2014 at 2:52

Attachments:

GoogleCodeExporter commented 8 years ago
I used JS Beautifier to get the code like that btw, if you wonder why my file 
is not minified.

http://jsbeautifier.org/

Original comment by contact@sundark.eu on 3 Aug 2014 at 2:54

GoogleCodeExporter commented 8 years ago
also, I use js FileReader with reader.readAsBinaryString to open my file.

Original comment by contact@sundark.eu on 3 Aug 2014 at 2:58

GoogleCodeExporter commented 8 years ago
So, the issue is that JavaScript strings are UTF-16, always. When you 
readAsBinaryString, of course, only a small subset of JavaScript's possible 
characters are used, but CryptoJS has no way to know that. In hindsight, I 
probably should have required the library user to always specify the character 
encoding of the input. Instead, the current behavior is that if you don't 
specify the character encoding (by first converting to bytes), then UTF-8 is 
picked as the default.

Original comment by Jeff.Mott.OR on 3 Aug 2014 at 3:51

GoogleCodeExporter commented 8 years ago
We may not know the character encoding, it's hard to deal with character 
encoding when it come to a file.

For example: a UTF8-NoBom encoded file give you no hint about its current 
encoding, you need to parse the first character and determine what encoding it 
is, when this is done automatically when running a webserver like apache2, this 
task adds alot of code on the developper side who would use the API.

I don't know about UTF-16 and JavaScript strings but I can guarranty that any 
non-AINSII encoding mismatch between my checksum tool and the output the 
website gave me.

I don't know how much it affect users that will use this library only for 
normal strings (instead of output from a file), it would probably don't affect 
them at all... ? I don't see a case where you could have a mismatch of hash 
since the string will be coded INSIDE the file anyway.

Original comment by contact@sundark.eu on 3 Aug 2014 at 5:54