Closed stevenatkin closed 2 years ago
I'm not sure what this refers to? #char_surrogate
refers to disallowing unpaired surrogate code points. I'm not sure what an "invalid surrogate pair" is in this context (other than unpaired)??
I was referring to unpaired surrogates.
actually, http://w3c.github.io/bp-i18n-specdev/#char_surrogate is saying exactly that.
if you follow the more link to the character model, it says:
Unicode contains some code points for internal use (such as noncharacters) or special functions (such as surrogate code points).
we could, of course, making the wording clearer, if needed, rather than just using the charmod wording.
Maybe we can simply add a few words to make it clearer.
[from Addison]
The requested rule already existed, but there was no text provided to explain it. I added the note [4] shown here:
Suggestion:
A "surrogate code point" refers here to the use of code points in the range U+D800 through U+DFF, inclusive. These code points only exist to allow the UTF-16 encoding to address supplementary characters, and are always used in pairs. A single surrogate code point is referred to as an "unpaired surrogate" and should never be used.
I'm not sure it needs to be in a note. It's just an explanation like many others of a piece of mustard.
I think it would also improve understanding (since the explanation is not always alongside the mustard) to change the guideline to say:
Specifications MUST NOT allow the use of unpaired surrogate code point.
I removed the "note" marker.
I think your edits make the text better, but I wanted to clarify the code points vs. code units thing here (i.e. we don't mean to ban UTF-16). Perhaps:
A "surrogate code point" refers here to the use of character values in the range
U+D800
throughU+DFFF
inclusive. These code points are reserved to allow the UTF-16 character encoding to address supplementary characters. Surrogates are always used in pairs and only appear when the UTF-16 encoding is being used. A single surrogate code point is referred to as an "unpaired surrogate" and should never be used.
works for me
Fixed
I did not seem to find any rule about rejecting invalid surrogate pairs. We have a rule that says you should accept surrogates, but nothing that talks about whether one should reject malformed surrogates.