Open mathiasbynens opened 1 year ago
ambiguity seems to have occurred about the handling of the "
Unknown
(Zzzz
)" value
I don't think there's ambiguity. This value is listed in PropertyValueAliases.txt
, therefore it is a valid value for Script
.
Incidentally, using only
PropertyValueAliases.txt
for enumerating script names is probably a dangerous option. This file listsKatakana_Or_Hiragana
(Hrkt
), which is likely to mean/[\p{sc=Katakana}\p{sc=Hiragana}]/
, but if so, using this as aScript
value seems to violate the "every Unicode code point is assigned a singleScript
property value" rule:
This issue should be taken up with the Unicode Consortium, not us. But given their alias stability policy (which I personally advocated for on behalf of TC39), this alias will never be removed, so I don't see anything that could be done about this, even if they wanted to.
According to my own check, while
PropertyValueAliases.txt
enumerates 165 script namesScripts.txt
lists code point ranges for 163 script names, as of Unicode 15.0.0. -2 areUnknown
andKatakana_Or_Hiragana
.
Theoretically, we could get away with switching to only support aliases which have code points assigned in Scripts.txt
(in effect dropping Unknown
and Katakana_Or_Hiragana
), but someone would have to do that web compatibility research and convince implementations that it's worth the risk just to prohibit an unwanted feature. Is that what's being proposed here?
The following was reported to me by Nozomu Katō via email. I’m reposting it here with permission: