jsdom / tr46

An implementation of the Unicode UTS #46: Unicode IDNA Compatibility Processing.
MIT License
32 stars 16 forks source link

Work around V8's normalize failures #3

Closed domenic closed 8 years ago

domenic commented 8 years ago

Per https://bugs.chromium.org/p/v8/issues/detail?id=4654 we need a version of normalize that doesn't suck. This blocks https://github.com/jsdom/whatwg-url/pull/36 from passing its tests

Sebmaster commented 8 years ago

What if we did some replacement + escaping logic, like

  1. replace all \ with \\
  2. replace all (character reserved for undecodeable characters) with \(character reserved for undecodeable characters)
  3. replace all \000 with (character reserved for undecodeable characters)
  4. replace all (character reserved for undecodeable characters) preceeded by an even number of \ with \000
  5. replace all \(character reserved for undecodeable characters) preceeded by an even number of \ with (character reserved for undecodeable characters)
  6. replace all \\ with \

Could also only do that if string.contains('\000') as common use-case optimization.

domenic commented 8 years ago

Hmm my thought was to split on \u0000, normalize each segment, then join them together with \u0000 in between them.

Sebmaster commented 8 years ago

That actually sounds much better, let's do that.

Sebmaster commented 8 years ago

Fixed in 3ac1a2b. Will do a release when I've tested it with whatwg-url... tomorrow.

domenic commented 8 years ago

<3