unicode-org / unicodetools

home of unicodetools and https://util.unicode.org JSPs
https://util.unicode.org
Other
52 stars 41 forks source link

idnatest error if xn-- empty or all-ASCII #827

Closed markusicu closed 6 months ago

markusicu commented 6 months ago

[165-A48] Action Item for Markus Scherer, Editorial Committee: Update UTS#46 to validate ACE label edge cases, see L2/20-240 item F7. For Unicode 14.

Corresponds to spec change

L2/20-240 item F7

The IDNA2008 ToUnicode operation validates ACE labels ("xn--" plus Punycode) by decoding them, then re-encoding via ToASCII, and verifying that the round-trip output is the same as the input (case-insensitive).

The UTS#46 ToUnicode operation and its Processing step uses a cheaper Convert/Validate step which wants to be equivalent.

However, it misses two edge cases which pass Convert/Validate step but which IDNA2008 catches with its round-trip verification:

  1. "xn--" decodes to an empty string
  2. "xn--ASCII-" decodes to just "ASCII"

I propose that we modify https://www.unicode.org/reports/tr46/#ProcessingStepPunycode (section 4 Processing > step 4 "Convert/Validate" > If the label starts with “xn--”) so that it catches these cases.

...