mathiasbynens / punycode.js

A robust Punycode converter that fully complies to RFC 3492 and RFC 5891.
https://mths.be/punycode
MIT License
1.6k stars 159 forks source link

Overflow check in encoder differs from RFC3492? #38

Open adon-at-work opened 9 years ago

adon-at-work commented 9 years ago

Hi,

I find that the JS implementation of the toASCII() encoder here closely resembles the C sample implementation in the RFC3492 specification, except the following:

In https://github.com/bestiejs/punycode.js/blob/702cc2daf10a5b97454666069e146d04745af728/punycode.js#L409,

if (currentValue < n && ++delta > maxInt) {
  error('overflow');
}

In https://tools.ietf.org/html/rfc3492#page-27,

if (input[j] < n /* || basic(input[j]) */ ) {
  if (++delta == 0) return punycode_overflow;
}

In particular, the JS version is checking ++delta against whether it's larger than maxInt, whereas the C version is checking against whether it's equal to 0. Is this intended, or have I missed anything when reading your code? Any clues?

Thanks!

mathiasbynens commented 9 years ago

Good catch!

This is not covered by the current tests. Changing ++delta > maxInt into ++delta == 0 still passes all tests in the suite. I’m struggling to remember why I wrote it this way. It may have been based on one of the other C implementations, but not sure.

adon-at-work commented 9 years ago

I took a quick look at the 5 implementations that your README has mentioned. Here's a quick summary after my very superficial lookup on whether ++delta is checked for overflow error.

The following skipped the check (might have made some assumptions, which I've not drilled into):

The following maintained the check (++delta == 0 then overflow):

The remaining JS implementations reached a "consensus" to check ++delta == maxInt (It seems we lack a JS version that aligns with the spec??? I'm not sure...):

Would you mind sharing any insights on how this will be followed-up? deep dive into the punnycode? draw in insights from my colleagues or more reviewers? Pls Let me know if I can be of any help may be... :)

mathiasbynens commented 9 years ago

@adon-at-work I really appreciate the work you’re doing here. Nice job!

It would really help to have a test case that fails with the one behavior but not the other.

adon-at-work commented 8 years ago

Here's a sample string (this particular example may not make sense for DNS :) ), which will trigger your ++delta > maxInt condition to throw exception.

var str, arr = [], i;
for (i = 0; i < 1950; i++) { arr[i] = 0x80; }
punycode.toASCII(String.fromCodePoint.apply(null, arr.concat([13965,0x10FFFF])));

Yet, this string would not trigger the C version to overflow. It's because the ++delta == 0 there in C is testing for unsigned integer to overflow, i.e., checking if the incremented delta is larger than the maximum value of what an unsigned 4-byte long integer can hold (i.e., 0xffffffff or 4294967295).

I looked deeper into your encoder implementation. While the number representation in JS is a bit "different/weird" in the sense that it can evaluate 4294967295 + 1 === 4294967296 as true, so that ++delta === maxInt can be considered equiv. to ++delta == 0 in C.

But then, may be the question now becomes why maxInt was set as maxInt = 2147483647, // aka. 0x7FFFFFFF or 2^31-1 (https://github.com/bestiejs/punycode.js/blob/master/punycode.js#L26) but not 0xffffffff?

RokerHRO commented 6 years ago

There are no "Integers" in JavaScript, only (floating point) "Numbers".

Only for bit operations these floating point numbers are converted into signed(!) 32 bit integers and converted back into floating point after the calculation. The one exception is the >>> operator, that returns an "unsigned 32 bit integer", that's why >>> 0 is a common idiom to "convert" a negative signed number into its unsigned counterpart.