Closed msmuenchen closed 8 years ago
Found out the reason: JS strings are UTF16-stored, while PHP assumes multi-byte with UTF8. Fix is easy with the library at http://www.onicos.com/staff/iz/amuse/javascript/expert/utf.txt; I described the usage in http://stackoverflow.com/questions/19835609/differing-sha1-hashes-for-identical-values-on-the-server-and-the-client/21341088#21341088 where someone had a similar issue.
Might be worth to incorporate this conversion into the digest function?
Yes, it might be worth adding an encoding
parameter to the digest
method, which would be evaluated in the conversion function.
Would you like to make the change and submit a PR?
I'm not that deep into JS, can you please do it?
I'm a bit short on time at the moment, but I'll see if I can get around to it sometime next week.
Anyway, thanks for pointing that out!
I'll submit a patch that runs unescape(encodeURIComponent(str))
on the string before interpreting it (this converts the string to its equivalent UTF-8 character codes).
Where exactly would I insert that? https://github.com/srijs/rusha/blob/master/rusha.js#L164 looks like a good candidate.
Hi.
Please modify rusha.sweet.js. A good candidate would be the rawDigest
method. It could take an optional options parameter, where you can opt-in to the unescape(encodeURIComponent(str))
conversion.
@sergeevabc in case you still need it - from the documentation (readme) "Create a hex digest from a binary String. A binary string is expected to only contain characters whose charCode < 256"
So the library will not work on arbitrary strings The workaround I found for your case is to first convert your utf-8 encoded string to byte array and then pass it to rusha. See the code below:
function toUTF8Array(str) {
var utf8 = [];
for (var i=0; i < str.length; i++) {
var charcode = str.charCodeAt(i);
if (charcode < 0x80) utf8.push(charcode);
else if (charcode < 0x800) {
utf8.push(0xc0 | (charcode >> 6),
0x80 | (charcode & 0x3f));
}
else if (charcode < 0xd800 || charcode >= 0xe000) {
utf8.push(0xe0 | (charcode >> 12),
0x80 | ((charcode>>6) & 0x3f),
0x80 | (charcode & 0x3f));
}
// surrogate pair
else {
i++;
// UTF-16 encodes 0x10000-0x10FFFF by
// subtracting 0x10000 and splitting the
// 20 bits of 0x0-0xFFFFF into two halves
charcode = 0x10000 + (((charcode & 0x3ff)<<10)
| (str.charCodeAt(i) & 0x3ff));
utf8.push(0xf0 | (charcode >>18),
0x80 | ((charcode>>12) & 0x3f),
0x80 | ((charcode>>6) & 0x3f),
0x80 | (charcode & 0x3f));
}
}
return utf8;
}
var r = new Rusha();
var s = "любовь"
var a = toUTF8Array(s)
console.log(r.digest(a)); //will give you the correct sha1 af48c12732ffdbd4299b792c2b6da6f77a0898d7
Thanks for your input, @szydan. At that time I chose Fast SHA256.
Closing this as wontfix -- Rusha is not meant to be used directly on encoded strings with code-points above 255. If you want to hash strings like these, please be sure to convert them into the desired binary encoding beforehand.
Hi,
I'm having problems using rusha for comparing a string in Javascript with the same string hashed in PHP.
In Javascript, I use
and in PHP (once using a literal ä, once a json-decode'd ä to rule out a bug in PHP or my file encoding)
which gives me the output
Why are the SHA1 hashes different? After all, using the \u00e4 notation should result in the same byte sequence both in a PHP string and a Javascript string, right?