Open NSoiffer opened 1 year ago
U+0320–U+03FF
are not part of the operator dictionary so they must return the default category. But as I previously mentioned that item 2. of https://w3c.github.io/mathml-core/#dfn-algorithm-to-determine-the-category-of-an-operator also remaps characters from Operators_2_ascii_chars
inside this range (so they can be handled by the compact dictionary) and consequently this early return of the Default category is necessary. I'll add a WPT test to verify that, so that an implementer does not forget that step.
@fred-wang I think it's reasonable to ask though why that range, especially as it uses all the standard Greek code points. Why isn't a range from the Private use area used here, as it's just an internal mapping of the tables.
AFAIK, it still possible to use PUA characters in <mo>
and they should have default spacing so not sure how that would help... And note that these values are transformed in step 3 to produce a key (code point + form) encoded on 14bits.
ah 14 bits 03FF which explains the range, which I guess answers @NSoiffer's question. Maybe we should say that so it doesn't look like we are ignoring Greek. I agree it makes no difference in practice as single letter Greek, like single letter Latin is never going to need an opdict entry so the slots are "free"
I'm not sure what's the next actionable step. AFAIK the text in the spec is correct and covered by tests.
Choosing this range is cleaver but "random" (there are plenty of other ranges from other alphabets that I think could be used). I think an informative note (just one or two sentences similar to your comment) in the spec as to why this is done is appropriate. Specs should not have mysteries buried in them.
This is separated out from #167 since the other issues are settled and it should be closed for CR.
Core says:
That ranges makes no sense to me. It covers part of the combining chars and also the Greek/Coptic chars. I think maybe it is trying to capture the combining chars, but the combining chars range is U+0300 - U+036F. There are additional combining chars 1AB0–1AFF and 1DC0–1DFF that maybe should be included.
And from a later comment:
I still don't see why U+0320–U+03FF makes sense. Why are some combining chars included in the range and not others? Why is a Greek alpha treated different than a latin a? Although you (@fred-wang) don't need include text in the spec why this is so, it seems like a bug to me so you should explain why it isn't a bug.