w3c / mathml-core

MathML Core draft
https://w3c.github.io/mathml-core
36 stars 14 forks source link

Why the range U+0320–U+03FF when computing spacing? #169

Open NSoiffer opened 1 year ago

NSoiffer commented 1 year ago

This is separated out from #167 since the other issues are settled and it should be closed for CR.

Core says:

If Content is a single character in the range U+0320–U+03FF then exit with category Default.

That ranges makes no sense to me. It covers part of the combining chars and also the Greek/Coptic chars. I think maybe it is trying to capture the combining chars, but the combining chars range is U+0300 - U+036F. There are additional combining chars 1AB0–1AFF and 1DC0–1DFF that maybe should be included.

And from a later comment:

I still don't see why U+0320–U+03FF makes sense. Why are some combining chars included in the range and not others? Why is a Greek alpha treated different than a latin a? Although you (@fred-wang) don't need include text in the spec why this is so, it seems like a bug to me so you should explain why it isn't a bug.

fred-wang commented 1 year ago

U+0320–U+03FF are not part of the operator dictionary so they must return the default category. But as I previously mentioned that item 2. of https://w3c.github.io/mathml-core/#dfn-algorithm-to-determine-the-category-of-an-operator also remaps characters from Operators_2_ascii_chars inside this range (so they can be handled by the compact dictionary) and consequently this early return of the Default category is necessary. I'll add a WPT test to verify that, so that an implementer does not forget that step.

davidcarlisle commented 1 year ago

@fred-wang I think it's reasonable to ask though why that range, especially as it uses all the standard Greek code points. Why isn't a range from the Private use area used here, as it's just an internal mapping of the tables.

fred-wang commented 1 year ago

AFAIK, it still possible to use PUA characters in <mo> and they should have default spacing so not sure how that would help... And note that these values are transformed in step 3 to produce a key (code point + form) encoded on 14bits.

davidcarlisle commented 1 year ago

ah 14 bits 03FF which explains the range, which I guess answers @NSoiffer's question. Maybe we should say that so it doesn't look like we are ignoring Greek. I agree it makes no difference in practice as single letter Greek, like single letter Latin is never going to need an opdict entry so the slots are "free"

fred-wang commented 1 year ago

I'm not sure what's the next actionable step. AFAIK the text in the spec is correct and covered by tests.

NSoiffer commented 1 year ago

Choosing this range is cleaver but "random" (there are plenty of other ranges from other alphabets that I think could be used). I think an informative note (just one or two sentences similar to your comment) in the spec as to why this is done is appropriate. Specs should not have mysteries buried in them.