mathiasbynens / punycode.js

A robust Punycode converter that fully complies to RFC 3492 and RFC 5891.
https://mths.be/punycode
MIT License
1.6k stars 159 forks source link

`mapDomain` doesn’t follow IDNA separator requirements #11

Closed mathiasbynens closed 12 years ago

mathiasbynens commented 12 years ago

As reported by @annevk, mapDomain does not seem to follow IDNA separator requirements.

http://logbot.glob.com.au/?c=freenode%23whatwg&s=18%20Sep%202012&e=18%20Sep%202012#c722062

From http://tools.ietf.org/html/rfc3490#section-3.1 (IDNA 2003):

3.1 Requirements

   IDNA conformance means adherence to the following four requirements:

   1) Whenever dots are used as label separators, the following
      characters MUST be recognized as dots: U+002E (full stop), U+3002
      (ideographic full stop), U+FF0E (fullwidth full stop), U+FF61
      (halfwidth ideographic full stop).

   2) Whenever a domain name is put into an IDN-unaware domain name slot
      (see section 2), it MUST contain only ASCII characters.  Given an
      internationalized domain name (IDN), an equivalent domain name
      satisfying this requirement can be obtained by applying the
      ToASCII operation (see section 4) to each label and, if dots are
      used as label separators, changing all the label separators to
      U+002E.

   3) ACE labels obtained from domain name slots SHOULD be hidden from
      users when it is known that the environment can handle the non-ACE
      form, except when the ACE form is explicitly requested.  When it
      is not known whether or not the environment can handle the non-ACE
      form, the application MAY use the non-ACE form (which might fail,
      such as by not being displayed properly), or it MAY use the ACE
      form (which will look unintelligle to the user).  Given an
      internationalized domain name, an equivalent domain name
      containing no ACE labels can be obtained by applying the ToUnicode
      operation (see section 4) to each label.  When requirements 2 and
      3 both apply, requirement 2 takes precedence.

   4) Whenever two labels are compared, they MUST be considered to match
      if and only if they are equivalent, that is, their ASCII forms
      (obtained by applying ToASCII) match using a case-insensitive
      ASCII comparison.  Whenever two names are compared, they MUST be
      considered to match if and only if their corresponding labels
      match, regardless of whether the names use the same forms of label
      separators.
annevk commented 12 years ago

I'm not entirely sure whether IDNA2008 requires this too by the way, but it seems highly unlikely browsers will ever move away from supporting these additional label separators as content relies on them working.

mathiasbynens commented 12 years ago

Hrm, turns out IDNA2008 RFC 5895 is rather vague on this subject:

  4.  [IDNA2008protocol] is specified such that the protocol acts on
       the individual labels of the domain name.  If an implementation
       of this mapping is also performing the step of separation of the
       parts of a domain name into labels by using the FULL STOP
       character (U+002E), the IDEOGRAPHIC FULL STOP character (U+3002)
       can be mapped to the FULL STOP before label separation occurs.
       There are other characters that are used as "full stops" that one
       could consider mapping as label separators, but their use as such
       has not been investigated thoroughly.  This step was chosen
       because some input mechanisms do not allow the user to easily
       enter proper label separators.  Only the IDEOGRAPHIC FULL STOP
       character (U+3002) is added in this mapping because the authors
       have not fully investigated the applicability of other characters
       and the environments where they should and should not be
       considered domain name label separators.
annevk commented 12 years ago

I think we want to have the same characters as IDNA2003, but we should prolly figure out what the browser vendors are going to implement. And then standardize that either in UTS #46 or the URL Standard.

mathiasbynens commented 12 years ago

I have a patch ready that adds support for IDNA2003 separators.

mathiasbynens commented 12 years ago

Committed: 131260bd2f0f658ae395bc467c2395a71d7a3c3b. Thanks again!