Hash collisions for UTF-8 passwords

GoogleCodeExporter commented 9 years ago

javascript-bcrypt
jsBCrypt 
I am using v2.2 on Linux.

To reproduce the problem, run the following code:

var b1 = new bCrypt(),
    b2 = new bCrypt();
b1.hashpw('\u6e2f', '$2a$05$0000000000000000000000', function(hash1) {
    console.log(hash1);
    b2.hashpw('\u6f2f', '$2a$05$0000000000000000000000', function(hash2) {
        console.log(hash2);
        if (hash1 === hash2) {
            console.log('Hash collision !!!!');
        } else {
            console.log('Hashes are different, as expected.');
        }
    });
});

The result of the code above is:
   $2a$05$000000000000000000000uZFTs0iC2rTIcGXz5VM9Rg6ZA/slcl8i
   $2a$05$000000000000000000000uZFTs0iC2rTIcGXz5VM9Rg6ZA/slcl8i
   Hash collision !!!!

The two passwords give the same hash. I think this is because of the following 
loop appearing in method bCrypt.prototype.hashpw():
for (var r = 0; r < password.length; r++) {
    passwordb.push(this.getByte(password.charAt(r)));
}

This means an enormous number of hash collisions since only one of the bytes of 
any multi-byte utf8 characters in a password is considered.

Original issue reported on code.google.com by nico...@nicolaspelletier.org on 14 Dec 2012 at 7:46

GoogleCodeExporter commented 9 years ago

I just looked at the code and it does not handle non-ascii passwords correctly. 
To be compliant with the Java version, you must convert the password to UTF-8 
before the encryption. Here is a simple change I made to the hashpw function to 
make it work correctly:
    password = password + (minor >= 'a' ? "\000" : "");
    for (var n = 0; n < password.length; n++) {
        var c = password.charCodeAt(n);
        if (c < 128) {
            passwordb.push(c);
        }
        else if((c > 127) && (c < 2048)) {
            passwordb.push((c >> 6) | 192);
            passwordb.push((c & 63) | 128);
        }
        else {
            passwordb.push((c >> 12) | 224);
            passwordb.push(((c >> 6) & 63) | 128);
            passwordb.push((c & 63) | 128);
        }
    }
    saltb = this.decode_base64(real_salt, this.BCRYPT_SALT_LEN);

So, simply converting the password to UTF-8 and placing the values in the 
passwordb array.

Original comment by slavik.m...@gmail.com on 10 Aug 2014 at 4:18

GoogleCodeExporter commented 9 years ago

@slavik.m: After a year and a half, you are the only one in the community who 
seem to have taken an interest in this. I bow deeply and thanks you kindly for 
your work.

I originally made the necessary fix myself in the NodeJS version of this code ( 
see https://github.com/shaneGirish/bcrypt-nodejs ), because that's what I 
needed. Since the creator of the node module said he took his original code 
here, I came here, saw the problem was here also and opened this ticket. I did 
not need this code, nor did I know how to transform a string in UTF8 in pure 
javascript so I simply opened the issue and hoped that somebody who needed the 
pure javascript implementation would take notice.

In any case, thank you. Hopefully, the maintainer of this code will be able to 
use your fix.

Cheers.

Original comment by nico...@nicolaspelletier.org on 11 Aug 2014 at 11:57

GoogleCodeExporter commented 9 years ago

Please note that slavik.m's implementation has a subtle potential bug; although 
it's far better than the original code which simply didn't handle non-ASCII 
characters at all, Unicode characters outside the BMP (with a Unicode code 
point greater than 0xffff) will still not make it to UTF-8 correctly. You can 
test this with an example chracter outside the BMP such as code point 0x29df6; 
the UTF-16 encoding is 0xd867 0xddf6, and the UTF-8 encoding is 0xf0a9 0xb7b6. 
If you try the code above, this character will be incorrectly encoded as 0xeda1 
0xa7ed 0xb7b6. Here is a modified version that corrects this issue:

    password = password + (minor >= 'a' ? "\000" : "");
    for (var n = 0; n < password.length; n++) {
        var c = password.charCodeAt(n);
        if (c < 128) {
            passwordb.push(c);
        }
        else if((c > 127) && (c < 2048)) {
            passwordb.push((c >> 6) | 192);
            passwordb.push((c & 63) | 128);
        }
        else if ((c > 55296) && (c < 56319)) {
            n++;
            if (n > password.length) {
                throw "utf-16 Decoding error: lead surrogate found without trail surrogate";
            }
            c = password.charCodeAt(n);
            if (c < 56320 || c > 57343) {
                throw "utf-16 Decoding error: trail surrogate not in the range of 0xdc00 through 0xdfff";
            }
            c = ((password.charCodeAt(n - 1) - 55296) << 10) + (c - 56320) + 65536;
            passwordb.push((c >> 18) | 240);
            passwordb.push(((c >> 12) & 63) | 128);
            passwordb.push(((c >> 6) & 63) | 128);
            passwordb.push((c & 63) | 128);
        }
        else {
            passwordb.push((c >> 12) | 224);
            passwordb.push(((c >> 6) & 63) | 128);
            passwordb.push((c & 63) | 128);
        }
    }
    saltb = this.decode_base64(real_salt, this.BCRYPT_SALT_LEN);

If you are going to be checking your hashes against another platform's 
implementation of bCrypt (such as Java, PHP, etc.), it would be advisable to 
check a test string against that platform to make sure it comes out the same in 
JavaScript as well. If you want to generate a non-BMP string, you'll need to 
specify it as a surrogate pair in JavaScript. You can do this in a string 
literal like this:

var password = "\ud867\uddf6";

This will be similar in C# or Java. I don't offhand see a way to do the exact 
same thing in PHP, however you should be able to specify a string exactly in 
hex by doing something like:

$password = hex2bin('f0a9b7b6');

Note that I specified the string in UTF-8 in PHP, as that is its native 
encoding.

I hope that helps - if anyone has questions, post back here; I'm subscribed on 
this thread.

Original comment by xer...@gmail.com on 12 Aug 2014 at 1:40

GoogleCodeExporter commented 9 years ago

Please correct me if I am wrong, but I think the first byte of a surrogate pair 
can take values from 0xD800 ( which is 55296 ) to 0xDBFF ( which is 56319 ). If 
we want to cover all the possibilities of surrogate pairs, the condition should 
be inclusive of these values.

Thus, the condition should be:
  else if ((c >= 55296) && (c <= 56319)) {
instead of
  else if ((c >  55296) && (c <  56319)) {

For the rest, I trust you on the calculations. It's been too long since I 
worked in the unicode formats and conversions between them to remember all the 
details.

I quite agree with you that the first thing that should have been done when 
implementing any bcrypt encoding would have been to compare with the reference 
implementation ( probably somewhere in the unix code ) with as large as 
possible sampling of characters from the unicode set.

Thanks for the update.

Original comment by nico...@nicolaspelletier.org on 12 Aug 2014 at 11:39

GoogleCodeExporter commented 9 years ago

Indeed, you are correct. Not only is that bug in what I posted, I actually have 
some of my own code to deal with Unicode in JavaScript with that same bug (I 
just converted the hex to decimal and pasted it here). And that kids, is why 
you always include edge cases in your unit tests. Thanks for the catch! Here's 
the updated fix:

password = password + (minor >= 'a' ? "\000" : "");
for (var n = 0; n < password.length; n++) {
    var c = password.charCodeAt(n);
    if (c < 128) {
        passwordb.push(c);
    }
    else if((c > 127) && (c < 2048)) {
        passwordb.push((c >> 6) | 192);
        passwordb.push((c & 63) | 128);
    }
    else if ((c >= 55296) && (c <= 56319)) {
        n++;
        if (n > password.length) {
            throw "utf-16 Decoding error: lead surrogate found without trail surrogate";
        }
        c = password.charCodeAt(n);
        if (c < 56320 || c > 57343) {
            throw "utf-16 Decoding error: trail surrogate not in the range of 0xdc00 through 0xdfff";
        }
        c = ((password.charCodeAt(n - 1) - 55296) << 10) + (c - 56320) + 65536;
        passwordb.push((c >> 18) | 240);
        passwordb.push(((c >> 12) & 63) | 128);
        passwordb.push(((c >> 6) & 63) | 128);
        passwordb.push((c & 63) | 128);
    }
    else {
        passwordb.push((c >> 12) | 224);
        passwordb.push(((c >> 6) & 63) | 128);
        passwordb.push((c & 63) | 128);
    }
}
saltb = this.decode_base64(real_salt, this.BCRYPT_SALT_LEN);

Original comment by xer...@gmail.com on 19 Sep 2014 at 2:41

nevins-b / javascript-bcrypt

Hash collisions for UTF-8 passwords #8