w3c / csswg-drafts

CSS Working Group Editor Drafts
https://drafts.csswg.org/
Other
4.5k stars 665 forks source link

[css-text] text-transform values for MathML #5386

Closed bkardell closed 5 months ago

bkardell commented 4 years ago

(Part of #5384 - MathML Core related CSS)

Note: This was previously introduced last year, given the noise and confusion generated, I am opening a new issue to hopefully explain and answer questions more directly (you don't need the background of these issues, it would take some time to consume them and they are incomplete, but they are #3745 and a larger topic on text-transforms, design and theory in #3775)


Background

The rendering of mathematical text employs common conventions that allow authors to express and readers to understand their meaning. Sometimes this involves how they are laid out, but often it is in the particulars of character's rendering. To this end, Unicode defines a range for Mathematical Alphanumeric Symbols and particular stylistic variants which can convey additional local contextual value and convey meaning. As one example: it is possible for an equation to employ two variants of the same character representing different things, as in the example below from Unicode of a well known equation requiring this. At the top is how it is intended to be understood, with distinctly rendered H's and below is how it would be (incorrectly/confusingly) be understood without this distinction.

Hamiltonian equation requiring treatment distinction to retain meaning

This is quite a complex topic for a wide range of reasons and various distinctions can be more or less important at various levels. Because of this, Unicode has added support for thousands of numerical 'symbols', including invisibles, allowing decent encoding of mathematics as text with its own 'alphabet', even where there is look-alike overlap, for example, Mathematical sigmas are not the same as related Greek letters despite their recognizable appearance in much the same way that in everyday text O and 0 or 1 and l are distinct characters, and their applications and have various uses/treatments (see https://en.wikipedia.org/wiki/Sigma#Science_and_mathematics and https://en.wikipedia.org/wiki/Sigma#Mathematical_Sigma)

Why transforms?

MathML offered structure and markup oriented solutions which didn't require access to characters in Plane 1 of Unicode. In a very general way, the structures provided by MathML are useful in that they fit the DOM/markup model, allow styling, and you can thus lay good default rules upon them: Single character identifiers (<mi>) for example are rendered as italics by default according to normal conventions. Thus, the following MathML...

<math display="block">
  <mrow>
    <msup>
      <mi>x</mi>
      <mn>2</mn>
    </msup>
    <msup>
      <mi>y</mi>
      <mn>2</mn>
    </msup>
  </mrow>
</math>

Would render with the identifiers x and y as italic mathematical identifiers as:

rendering with x and y italicized

MathML allows that you can override these as necessary or provide the additional distinctions. Because of where it fits in history, MathML co-evolved a solution of overiding by providing an attribute called mathvariant which allowed values of what are now the Unicode variant names. For example, if we want to express an equation about a real number ala

illustration of an equation involving real number symbolic fraktur R

The Fraktur R is in Plane 1. In addition to allowing authors to eventually use the unicode character (&#x211C;), MathML historically provided the mathvariant allowing authors (and tools) to express this as

<math xmlns="http://www.w3.org/1998/Math/MathML">
  <mi mathvariant="fraktur">R</mi>
  <mo stretchy="false">(</mo>
  <mn>2</mn>
  <mo>+</mo>
  <mn>3</mn>
  <mi>i</mi>
  <mo stretchy="false">)</mo>
  <mo>=</mo>
  <mn>2</mn>
</math>

As in the example from Unicode (provided in Backgound) above, this may, or may not be critically significant for a given case - but it is important that this distinction is maintained as text as much as possible for copy/paste operations or for speech subsystems. While the spec recommends that authors use the unicode characters directly where it is important, since a large corpus of millions of MathML equations exist, and numerous tools for creating it currently rely on understanding the mathvariant attribute, it must be maintained and supported.

Based on 3 implementations experience, MathML-Core accomplshes this largely by mapping this legacy attribute to CSS's infrastructure and text-transform values.

The specifics

MathML-Core largely codifies the mapping and seeks to expose this to authors and increase interoperability, it extends CSS Text L3 with new values for text-transform that play this role in mapping alphanumeric text to the equivalent Mathematical Alphanumeric Symbols as

[none | [capitalize | uppercase | lowercase ] || full-width || full-size-kana | [capitalize | uppercase | lowercase ] || [ math-auto | math-bold | math-italic | math-bold-italic | math-double-struck | math-bold-fraktur | math-script | math-bold-script | math-fraktur | math-sans-serif | math-bold-sans-serif | math-sans-serif-italic | math-sans-serif-bold-italic | math-monospace | math-initial | math-tailed | math-looped | math-stretched ]

math-auto is the default value for text-transform of MathML elements. It follows convention and automatically renders single letter identifiers with mathematical italics, and others normally.

Proposal details at https://mathml-refresh.github.io/mathml-core/#new-text-transform-values Tentative tests are at https://github.com/web-platform-tests/wpt/pull/16922

fred-wang commented 4 years ago

I wonder whether it would be possible to modify the current proposal to also say

"If the specified value of text-transform is math-auto and the inherited is not none then computed value is the inherited value"

that is

<div><div style="text-transform: math-auto">sin</div></div>

<div><div style="text-transform: math-auto">x</div>

<div style="text-transform: math-fraktur">
  <div style="text-transform: math-auto">x</div>
</div>

would respectively produce upright sin, slanted x and fraktur x. This would allow to implement MathML's behavior

<math><mi>sin</mi></math>
<math><mi>x</mi></math>
<math mathvariant="fraktur"><mi>x</mi></math>

without tweaking too much the UA stylesheet.

cc @faceless2

rwlbuis commented 4 years ago

This would be easily doable with a custom TextTransform::ApplyValue (chromium):

void TextTransform::ApplyValue(StyleResolverState& state, const CSSValue& value) const { auto text_transform = To(value).ConvertTo(); if (text_transform == ETextTransform::kMathAuto && state.ParentStyle()->TextTransform() != ETextTransform::kNone) { ApplyInherit(state); return; } state.Style()->SetTextTransform(text_transform); state.Style()->SetTextTransformIsInherited(false); }

faceless2 commented 4 years ago

This is following on from https://github.com/mathml-refresh/mathml/issues/204. The solution proposed there is to have the following in the UA stylesheet:

mi {
  text-transform: math-auto;
}
*[mathvariant] mi {
  text-transform: none;
}

But that only works if the "mathvariant" attribute is used to set the transform (the "mathvariant" attribute is a presentational hint that sets text-transform with a specificity of [author,0]). If the transform is set directly CSS using text-transform, as you've done with the <div> in the first block of examples, it won't work, which I presume is why you're proposing this approach?

Well, we could certainly implement it, it would fix the problem and I don't think it would break anything else - so it works for me, although we're the smallest fish in this particular pond.

fred-wang commented 4 years ago

@rwlbuis thanks! @faceless2 Yes, I'd prefer to have a purely CSS approach which works in other context if possible, rather than introducing a MathML-specific rule in the UA sheet.

I haven't checked, but think this is also how Gecko's internal -moz-mathvariant CSS property implements the auto italic.

fred-wang commented 4 years ago

OK, I tentatively added the special case for math-auto to the proposal so that it can be discussed in the CSSWG meeting.

css-meeting-bot commented 4 years ago

The CSS Working Group just discussed text-transform values for MathML, and agreed to the following:

The full IRC log of that discussion <fantasai> Topic: text-transform values for MathML
<fantasai> github: https://github.com/w3c/csswg-drafts/issues/5386
<fantasai> bkardell_: MathML created and exists with lots of tools/systems that don't have full access to Unicode
<fantasai> bkardell_: So legacy documents and even things written before that available
<fantasai> bkardell_: so number of case where the markup itself contains the information that you need in order to understand that this character that we want to render isn't the literal text value of the element
<fantasai> bkardell_: text-transforms were the solutions that we used
<fantasai> bkardell_: because that's what needs to happen
<fantasai> bkardell_: didn't see any reason to make that specifically hidden or unavailable to users
<fantasai> bkardell_: I know fantasai and Florian had raised some concerns last time
<fantasai> bkardell_: we've talked a bunch since then
<fantasai> bkardell_: fantasai has updated the meta-advice in css-text-3 to provide some nuance
<fantasai> bkardell_: the meaning *is* in the document
<fantasai> bkardell_: I don't know if ppl still object to these or what
<fantasai> fredw: 2 separate cases
<fantasai> fredw: case of math-auto, which is automatic italic
<fantasai> fredw: and this is the most important one
<fantasai> fredw: Not adding any semantics
<fantasai> fredw: default var rendered as italic
<fantasai> fredw: the other was strings for tools/documents not using Unicode
<fantasai> fredw: we are using text-transform to do the transformation even if MathML says to preserve the semantic
<fantasai> fredw: maybe a bit controversial
<fantasai> fredw: Florian was saying it's OK as long as we have the mathvariant attribute in the markup
<fantasai> fredw: if ppl really don't like, can only add math-auto one
<fantasai> fredw: might break some back compat, might need a polyfill, but...
<fantasai> NeilS: could be done internally and not break anything
<fantasai> NeilS: My concern is a11y, will changed character be in the a11y tree
<fantasai> bkardell_: had several ppl who implement screenreaders saying that the transform value is exposed on existing ones, and that was a sticking point because we don't always want that
<fantasai> bkardell_: certainly we can go either way here
<fantasai> bkardell_: either it will be, or it won't be, exposed to screenreaders
<fantasai> NeilS: As long as exposed to screenreaders, then no a11y issue
<faceless2> +1 to Neils comment
<fantasai> bkardell_: There's a specific example in the issue itself
<fantasai> bkardell_: to non-Math ppl like myself, not intuitive
<fantasai> emilio: Proposal of math-auto
<fantasai> emilio: is it like user-select, like auto behaves as inherit or something?
<fantasai> emilio: it's not clear to me
<fantasai> emilio: seems like pseudo-code that Rob posted would be computed value time which is a bit odd
<fantasai> fredw: basically transformation, whether italic or not [...]
<fantasai> fredw: it doesn't copute to italic
<fantasai> s/copute/compute/
<fantasai> iank_: would it be fair to say that you'd apply to ...
<fantasai> fredw: only to mi element
<fantasai> fredw: mi { text-transform: math-auto }
<fantasai> fredw: Takes effect when only one letter
<fantasai> fredw: don't think it can be computed
<fantasai> iank_: the specific variant is based on the attribute on the mi element?
<fantasai> bkardell_: in the example or in general?
<fantasai> bkardell_: mi is special, because it has this idea of a single-letter identifier
<fantasai> bkardell_: those are treated stylistically a certain way
<fantasai> bkardell_: but that's only stylistic, no meaning
<fantasai> bkardell_: but math-variant is where you provide additional semantics missing from your lack of character support
<fantasai> iank_: so if you have mathvariant specified, it turns off that auto text-transform behavior?
<fantasai> faceless2: mathvariant is acting as a preshint
<fantasai> faceless2: but math-auto, if no other math-transform is set
<fantasai> faceless2: it would be italicized if it was one letter
<fantasai> iank_: so also have all the other math transform values
<fantasai> fredw: can override the default behavior
<fantasai> iank_: so it's for this leaf to do this slight magical behavior
<fantasai> NeilS: I think math-auto is really presentation
<fantasai> NeilS: Others are there for legacy issue, and not presentation, should actually map to a different character
<fantasai> NeilS: that's why may not be appropriate for CSS
<bkardell_> fantasai: I guess I have 1 question and 1 concern...
<bkardell_> fantasai: does the auto italic thing be a text transform really, or does it really just want font styling?
<bkardell_> fredw: the way math fonts are designed, you do
<bkardell_> fantasai: the other ones you do want to be a semantic effect. I am a little uncomfortable with this.
<bkardell_> fantasai: whatever screen readers do it's intention was clear and I don't love changing that
<fantasai> astearns: if this is only way to get semantics for legacy stuff, do we really want to expose it to CSS so that it can be used on new things?
<fantasai> bkardell_: MathML Core 1 is a pretty minimal subset, there are lots of things that use more elements than we're including
<fantasai> bkardell_: and the intent is for mathml to have a healthy future with additional levels
<fantasai> bkardell_: so will be unknown elements
<fantasai> bkardell_: so weird to say you don't have access to the magic to make other elements work like L1
<fantasai> iank_: broadly agree with that
<fantasai> iank_: also from what we've heard from screenreader developers
<fantasai> iank_: this sort of text-transform is only presentation
<fantasai> iank_: that ship sailed a long time ago
<fantasai> bkardell_: it's not that they couldn't be that
<fantasai> bkardell_: the new one has to be
<fantasai> iank_: text-transform: uppercase is definitely exposed to screenreaders
<fantasai> NeilS: Another case we haven't resolved yet
<fantasai> NeilS: is hyphen-minus
<fantasai> NeilS: vs minus
<fantasai> NeilS: They're defined to be equivalents
<fantasai> NeilS: it should map hyphen-minus to U+2122 MINUS
<fantasai> NeilS: Could imagine that text-transform would be the way to do this as well
<bkardell_> fantasai: you could go either way with that one
<fantasai> faceless2: You'd also struggle to do that with text-transform
<fantasai> faceless2: apply to whole document, then declaration for e.g. fraktur would override and disable
<fantasai> bkardell_: I suggest we either resolve or move on
<fantasai> fredw: maybe just resolve on math-auto, and unsure for rest
<fantasai> astearns: I think I heard you say that math-auto is the only one you have implemented so far?
<fantasai> bkardell_: upstream chromium
<fantasai> fredw: we have the others implemented in a separate branch
<fantasai> astearns: so let's resolve on math-auto
<fantasai> astearns: any objections to adding math-auto to text-transform?
<fantasai> emilio: Not quite objection, but want to clarify how it behaves exactly
<fantasai> astearns: resolve to add, then work on details
<fantasai> RESOLVED: Add math-auto to text-transform
<fantasai> astearns: still seems like there's concerns around the rest of math-values
<fantasai> faceless2: The only concerns are wrt exposing to screenreaders?
<fantasai> astearns: there seem to be a lot of them, also, that's my concern
<fantasai> fantasai: I don't like adding things that are supposed to alter semantics to CSS.
<fantasai> astearns: let's hold off on these for now
<fantasai> fantasai: ...
<fantasai> iank_: would it be possible to add a new HTML-level attribute with mathvariant and then, if you set `text-transform` to auto that'll read the attribute and apply it?
<fantasai> iank_: Semantics would still be in the document, just whether you apply it would be in CSS
<fantasai> NeilS: I don't think it makes sense to add something new
<fantasai> NeilS: this is all for legacy support
<fantasai> bkardell_: maybe talk about that, maybe at HTML layer we can do something
<fantasai> NeilS: I have to drop off for another MathML meeting
fred-wang commented 4 years ago

Similar to what @fantasai said in https://github.com/w3c/csswg-drafts/issues/5389#issuecomment-694361232 ; I've also been wondering whether the word "italic" should be in included in the math-auto name.

emilio commented 1 year ago

Can we make sure this gets edited into the spec @fred-wang / @fantasai? It's weird to implement things that aren't yet on a draft...

fred-wang commented 1 year ago

@emilio My reply is the same as https://github.com/w3c/csswg-drafts/issues/5389#issuecomment-1015306703

frivoal commented 1 year ago

This is now defined in https://w3c.github.io/mathml-core/#new-text-transform-values. Once that gets republished to TR, we probably should just add a reference to it from css-text. Given the specialized use-case, it seems best for this to live in the MATHML spec.

frivoal commented 5 months ago

Alright, so the math-auto value is now defined in MathML, while css-text covers the grammar and has a cross reference to MathML for the definition of the value. We can always revisit this later if we'd prefer a different arrangement, but for now, this seems solved.