Wrong answer on "\u001fx", apparently

domenic commented 7 years ago

Continued from https://github.com/w3c/web-platform-tests/pull/4504#issuecomment-270771165 . Repro:

"use strict";
const tr46 = require(".");
const util = require("util");

const input = "\u001fx";
const output = tr46.toASCII(input, false, tr46.PROCESSING_OPTIONS.TRANSITIONAL, false);
console.log(util.inspect(output));

outputs 'xn--\u001fx-' whereas it should output just '\u001fx' according to @annevk's reading in https://github.com/w3c/web-platform-tests/pull/4504#issuecomment-270848858 . Updates to follow...

domenic commented 7 years ago

The problem is this line in the spec:

Convert each label with non-ASCII characters into Punycode [RFC3492], and prefix by “xn--”. This may record an error.

which we translate as:

  labels = labels.map(function(l) {
    try {
      return punycode.toASCII(l);
    } catch(e) {
      result.error = true;
      return l;
    }
  });

It appears we are making an assumption that punycode.toASCII will be a no-op for ASCII characters (including \u001F). But it's not; the result of punycode.toASCII("\u001fx") is "xn--\u001fx-". It's not clear to me yet whether this is a punycode bug or if we should be doing the non-ASCII test ourselves. That is, it's not clear to me what the intended semantics of punycode.toASCII are.

domenic commented 7 years ago

Same bug in another tr46 library, at least according to source inspection: https://github.com/jcranmer/idna-uts46/blob/master/uts46.js#L92

It at least seems people are assuming that punycode.toASCII is a no-op on ASCII input.

jsdom / tr46

Wrong answer on "\u001fx", apparently #5