w3c / mathml-core

MathML Core draft
https://w3c.github.io/mathml-core
38 stars 14 forks source link

Clarify single character of mi as italic #231

Closed bert-github closed 1 week ago

bert-github commented 7 months ago

(This is part of the I18n WG review.)

Example https://www.w3.org/TR/2023/WD-mathml-core-20231127/#mi-example

Note that identifiers containing a single letter are italic by default.

A minor issue is that the example uses the word ‘letter’, suggesting that <mi>a9</mi> would also be italic, because it only contains one letter. But that is maybe not worth fixing, as it is only an example and the normative text below talks about ‘character’.

The transformation to italic is via text-transform: math-auto (section 4.2) https://www.w3.org/TR/2023/WD-mathml-core-20231127/#new-text-transform-values

On text nodes containing a single character, if the computed value is math-auto then the transformed text is obtained by performing conversion of each character according to the italic table.

This does not mention that white space is collapsed before the characters are counted. E.g., <mi> a </mi> counts as a single character. The MathML3 spec defines that (in 2.1.7 Collapsing Whitespace in Input), but it should probably be recalled here.

Also, the word ‘each’ when we know there is only one could be confusing.

fred-wang commented 7 months ago

A minor issue is that the example uses the word ‘letter’

right, the non-normative text should probably use "character" too

This does not mention that white space is collapsed before the characters are counted

This is on purpose since current version of MathML Core does not perform whitespace collapsing from MathML 3 (there are other issues about that).

Also, the word ‘each’ when we know there is only one could be confusing.

This is probably legacy stuff from the time when other mathvariants were supported e.g. <mi mathvariant="bold">sin</mi> where we had to convert each character one by one.

SmashManiac commented 2 months ago

I'm not sure if replacing "letter" with "character" is the way to go here. I am not aware of any existing MathML renderer where the infinity symbol or an ellipsis is rendered in italic inside a <mi> by default, nor would I intuitively expect them to based on ISO 80000-2 rules.

dginev commented 2 months ago

Related: Unicode combining characters can also be used to modify a letter variable name, as with circumflex, while traditionally expecting an italic rendering, matching the unmodified letter.


I agree MathML Core should be as clear as possible what is and isn't meant to be covered by the italic treatment.

Possibly 4.2 New text-transform value does that best:

On text nodes containing a single character, if the computed value is math-auto then the transformed text is obtained by performing conversion of each character according to the italic table.

Maybe a few extra words on the intended coverage of the italic table would help? Today a read through it reveals that "letter" currently means the more traditional for math "Latin or Greek letter".

SmashManiac commented 1 month ago

I just realized that MathML 3 currently specifies that all mi elements containing a single character defaults to a mathvariant of italic. So while the original suggestion would align with full MathML 3, it doesn't seem to match what's currently being rendered on the web right now for some reason.

I would speculate that this may be because non-letter characters generally don't have corresponding italic characters in Unicode so existing renderers don't know what to do?

In any case, it would be nice to clarify this, especially when considering that there are many cases where one would NOT want a one-letter mi element in italic, such as mathematical constants, well-known function names and abstract identifiers.

fred-wang commented 1 month ago

So just to repeat, the text mentioning "single letter" is a non-normative example, it's just meant to say that the two <mi>c</mi> are italic by default and that mathvariant="normal" overrides that.

The normative text is the one quoted by Delan: the UA stylesheet has mi { text-transform: math-auto; } by default, mathvariant="normal" is treated as a presentational hint for text-transform: none. And then the exact behavior is defined in the text-transform and "italic mappings" sections.

fred-wang commented 1 month ago

I pushed a commit that does not change the behavior and just try to be more explicit. Hopefully that addresses all the issues reported here.

For the record #149 is the issue about trimming whitespace characters.

davidcarlisle commented 1 month ago

I just realized that MathML 3 currently specifies that all mi elements containing a single character defaults to a mathvariant of italic. So while the original suggestion would align with full MathML 3, it doesn't seem to match what's currently being rendered on the web right now for some reason.

I would speculate that this may be because non-letter characters generally don't have corresponding italic characters in Unicode so existing renderers don't know what to do?

In any case, it would be nice to clarify this, especially when considering that there are many cases where one would NOT want a one-letter mi element in italic, such as mathematical constants, well-known function names and abstract identifiers.

Mathml3 status is

singe character mi (after space stripping) defaults to mathvariant=italic

By default mathvariant=italic has no effect (does not make text italic) other than the characters listed at

https://www.w3.org/TR/xml-entity-names/italic.html

mathvariant is not a font change it's a codepoint shift to the Unicode math italic block, so only has an effect on characters in that block.

There are some words hinting that systems may use css or font changes to style other characters but this is not guaranteed (and I think in Core should not be automatic although a document may supply its own css of course)

mathml3 section 3.2.1

Renderers should support those combinations of character data and mathvariant values that correspond to Unicode characters, and that they can visually distinguish using available font characters. Renderers may ignore or support those combinations of character data and mathvariant values that do not correspond to an assigned Unicode code point, and authors should recognize that support for mathematical symbols that do not correspond to assigned Unicode code points may vary widely from one renderer to another.

omentic commented 1 month ago

FYI @fred-wang the latest commit failed to deploy, and https://w3c.github.io/mathml-core/ currently results in a 404.

davidcarlisle commented 1 month ago

@omentic oh that's odd, there was no error from respec, but one stage of the gh action timed out for some reason. I just forced a rebuild and it's there now, thanks.

SmashManiac commented 1 month ago

Thank you very much for the clarifications! I had not previously realized that the italics mapping table was normative as all other mapping tables in section C aren't, nor that the italics mappings covered all possible CSS character substitutions. I personally find that @fred-wang's commit eliminated that particular "letter vs character" confusion for me as an external MathML Core user.

Note that I'm not currently in a position to comment on whether additional clarifications should be made or not.

bkardell commented 1 week ago

Given [fred's push](Clarify single character of mi as italic), I think the issue is resolved and am closing it.