w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
60 stars 18 forks source link

include (or not) a sample-set of default conversion from plain-MathML to MathML-with-intent #433

Open polx opened 1 year ago

polx commented 1 year ago

We should discuss if the charter of the next WG will deliver a sample-set explicitting a default conversion from MathML (without intent) to MathML (with intent) so that legacy MathML expressions can be enriched (at least partially).

In the group's last call:

@dginev indicated that we should not make this a visible deliverable as we shall not have it in any way complete.

@polx suggested that we should promise it as such a promise has no indications of completion and it will be helpful for hinting a first intent-enrichment process.

Let's discuss this on this isse.

dginev commented 1 year ago

My position on the call was focused on the claim:

A limited set of examples will be misleading to spec adopters.

Such examples will create an unrealistic (and underspecified) expectation of what may or may not be possible with simple defaulting rules. In reality, very little is inferred reliably with simple rules, and even for K12 materials one needs to consider the full presentation tree, matching on both XML structure and text content of each node. (+ in harder cases, surrounding expression context)

Since we do not have the resources to develop this kind of mechanism in full, my preference is to defer further work to MathML 5. For the current iteration, I would be more interested in investigating domain-specific rule-sets anchored around the "isa" ( #426 ) capability. It may be possible that simple defaulting examples may be realistic within very concrete "isa" values, such as chemical-formula, arithemtic-expression, system-of-equations, diagonal-matrix, ...


For a specific illustration of my point, consider the intuition:

"Most uses of <msup> have intent power."

If one applies such a rule over a larger text, it is bound to mispronounce a wide variety of different scripted constructs.

Here is an excerpt from my survey of Khan Academy's K12 materials - 15 notations relying on a simple <msup> superscript notation, only one of which is "power". Expand for details.

intentMathML example
power ```xml x 2 ```
foot ```xml 5 ' ```
inch ```xml 10 '' ```
ordinal-mark ```xml 10 th ```
degrees ```xml 10 ° ```
inverse-function ```xml sin -1 ```
embellished-name ```xml A ' ```
direction-of-approach ```xml 0 + ```
conjugate ```xml A * ```
transpose ```xml A T ```
inverse-matrix ```xml A -1 ```
absolute-complement ```xml A C ```
first-derivative ```xml f ```
nth-derivative ```xml f (n) ```
positive-ion ```xml Na + ```
NSoiffer commented 1 year ago

I think the issue is not whether the charter says we should have a sample set of mappings of defaults to intent, but:

  1. are there defaults so that authors know what AT will do without intent or should authors always use intent if they care about how something is spoken. As a basic example, if no intent is given on mfrac, should authors expect that it will be spoken as a fraction?
  2. if there are defaults, how basic or complicated should they be?

Background

I tried to answer that as part of a position paper a few months back. See that paper for more details. Here I will just list the defaults so they can be discussed individually or as a group via comments.

First off though, note that I strongly feel there needs to be a math level (or higher) attribute that controls defaulting behavior so that legacy documents can be interpreted with the knowledge that the authors were not able to make use of intent (hence, AT can infer whatever it feels is appropriate). I proposed an attribute intent-default with the following values:

Note: in all cases, if intent is given, it should be used (if appropriate). Even for legacy, it is possible remediation may have added an intent value.

Proposed defaults:

AT should have a specified default interpretation for every MathML Element. That doesn't mean that the exact words are specified, only that AT chooses words that convey the default meaning. For example: msup is spoken as "super" or "superscript" if intent-default = "structure" and is spoken as a power ("x squared", "x raised to the n minus 1 power", etc) if intent-default = "common". The exact words may depend upon both the audience and the arguments.

"structure"

The goal of the default is to avoid any inference of semantics. The meanings and special cases for all the MathML elements are (expand to see details):

* leaf tags speak their contents. Exceptions are: * ms likely indicates it is a string or speaks its open/close deliminators in addition to its contents. * mglyph speaks the alt text * mspace, malginmark, maligngroup, and none are either silent or generate pauses * msline, indicates that it is a line * mrow -- speaks the children * mfrac -- arg1 "over" arg2 (might need bracketing words -- start over/end over?) * msqrt -- "radical symbol" contents??? * mroot -- "radical symbol" with index and contents??? * merror -- indicates there is an error and speaks the contents * mfenced -- should speak the same as the equivalent mrow notation * menclose -- should indicate the notation attributes along with the contents. * msup -- should speak that it is a superscripts, although maybe there should be exceptions for the pseudo-script characters, in which case the superscript is not spoken (e.g, $x'$ is spoken "x prime") * msub -- indicates a subscript * msubsup -- indicates a subscripted variable raised a power with the same special cases as msup * mover -- indicates that the second argument is over the first, although the words need to clearly distinguish this from mfrac which is proposed to use the word "over". Maybe "base with 'over' above"? * Special cases: bar, hat, caret, tilde, dot (1-4 of them). Maybe acute and grave. Probably not overbrace and overparen because those likely need grouping words. * munder -- indicates that the second argument is under the first ("base with under below"?) * munderover -- indicates there is content above and below the base ("base with under below and over above"?). Uses the special cases of mover. * mmultiscripts -- indicates the scripts and their position in some way. E.g., "start-scripted ... pre-subscript ... pre-superscript ... base ... post-subscript ... post-superscript ... end-scripted" * mtable/mtr/mlabeledtr/mtd -- say something appropriate for tables (no recognition of determinants, matrices, vectors, etc) * elementary math elements (mstack/mlongdiv/msgroup/msrow/mscarries/mscarry) -- say something about the layout, but not that it is addition, long division, repeated decimals, etc. * maction -- speaks the selected child with maybe some indication of the action * semantics -- speaks the presentation child

"common"

The goal is to use common 'K-14" meanings so the need to use intent is minimized. The default meanings and special cases for all the MathML elements are (expand to see details):

* leaf tags speak their contents. Exceptions are: * ms likely indicates it is a string or speaks its open/close delimitators in addition to its contents. * mglyph speaks the alt text * mspace, malginmark, maligngroup, and none are either silent or generate pauses * msline, indicates that it is a line * mrow -- speaks the children * mfrac -- indicates it is division, but might have a number of special case rules depending on the arguments * msqrt -- indicates it is a square root * mroot -- indicates it is a root with an index. There should be special cases for at least '2' and '3' as the index * merror -- indicates there is an error and speaks the contents * mfenced -- should speak the same as the equivalent mrow notation * menclose -- should indicate the notation attributes along with the contents. Special case speech might be appropriate when menclose looks like a similar notation that has special cases (e.g, notation="top" looks the same as mover with a "_" (or equivalent) second child). * msup -- should assume that the notation is a power with the following special cases * the power is '2' or '3' * the power is '-1' and this is a trig function ([see below](https://github.com/w3c/mathml-docs/blob/main/minimal-intent-core.md#trig-and-log)) * the power is one of the pseudo-script characters, in which case the superscript is not spoken (e.g, $x'$ is spoken "x prime") * the power is an `mo` (and not one of the pseudo-script characters), use "superscript" or maybe "embellished with" instead of "power" * the base is one of the named sets ([see below](https://github.com/w3c/mathml-docs/blob/main/minimal-intent-core.md#namedsets-%E2%84%82-%E2%84%95-%E2%84%9A-%E2%84%9D-and-%E2%84%A4)) * msub -- indicates a subscript. Special cases: * the base is "log" * the base is one of the named sets ([see here](https://github.com/w3c/mathml-docs/blob/main/minimal-intent-core.md#namedsets-%E2%84%82-%E2%84%95-%E2%84%9A-%E2%84%9D-and-%E2%84%A4)) * the base is a large operator * others??? * msubsup -- indicates a subscripted variable raised a power with the same special cases as msup and msubsup. This includes (read the same as for munderover) * the base is a large operator * mover -- indicates that the second argument is over the first. Special cases: * bar, hat, caret, tilde, dot (1-4 of them). Maybe acute and grave. Probably not overbrace and overparen because those likely need grouping words. * the base is a large operator * munder -- indicates that the second argument is under the first. Special cases: * the base is a large operator * the base is "lim" or "limit" (FIX: does this need to be language agnostic?) * munderover -- indicates there is content above and below the base. Special cases: * those listed for mover * the base is a large operator (speak using "from" and "to" -- [see here](https://github.com/w3c/mathml-docs/blob/main/minimal-intent-core.md#large-operators)) * mmultiscripts -- indicates the scripts. Special cases??? * mtable/mtr/mlabeledtr/mtd -- say something appropriate for tables. Special cases: * row and column tables might have specialized speech * small tables with simple entries might have specialized speech * elementary math elements (mstack/mlongdiv/msgroup/msrow/mscarries/mscarry) -- say something appropriate * maction -- speaks the selected child with maybe some indication of the action * semantics -- speaks the presentation child

Summary

As stated at the start, we need to answer the questions of whether there should be defaults (given my long post, it should be clear what my position is). If other agree, then we need to come to an agreement on what the defaults should be. The above list is a first cut.

There is a trade-off that should be considered. If the rules/special cases become too numerous, then AT is less likely to implement them. On the other hand, if special cases aren't listed, then authors/authoring software needs to go to extra work to generate them which makes them less likely to use intent because it becomes burdensome to do so. I think the above list is implementable by AT because it is not that much larger than what more simplistic AT does now. I also think it probably captures a large majority of what authors want said by default, although I may have missed a few special cases.

If you feel a default/special case is missing or if some default is wrong/has too many special cases, that's what comments are for...

NSoiffer commented 1 year ago

Responding to @dginev's comment...

While I completely agree that simple rules will fail to capture a significant number of special cases, I disagree that they are not useful. I would love to gather some data, but my guess based on looking at a lot of math textbooks and tutorials over the years is that they will capture over 95% of the cases, maybe even over 99%. These numbers don't reflect "good" speech, just "not wrong" speech. I know that is pretty bold, but simply put, there are a lot of mfrac, msqrt, and mroots out there and they are almost always what their names imply in K12 math. Furthermore, when munder, mover, and munderover have a large op as the base, the rules are going to be correct 99+% of the time for K12, and the rules for them when they aren't large ops will never be wrong since they just describe the structure (which is what has to happen without defaults anyway).

Moving on to mrow, since it just speaks the children, it is structural. It may not be optimal (e.g., it will miss absolute value and lots of other notation), but it won't be wrong, just ugly -- no less ugly than if the rule wasn't there. So again, 100% correct.

The least accurate rule is the one for msup. There, it is likely there will be cases where if it is not a special case, "power" won't be the correct reading. However, if you flip through most K12 textbooks, power is what is very commonly meant except for the pseudo scripts I listed. So maybe msup is only right 90% (still just a guess). The one place where it would fail more often is a Chemistry book where scripts mean something else. Given the specialized notation (including non-italic element names), I would hope macros get used to produce that do add intent and isa so that the defaults don't get used.

Based on your examples, I updated the msup rule to add a special case to use "superscript"/"embellished with" for mo when not a pseudo-script. That still leaves cases where it will get it wrong (' is foot or minute, and not prime; $(n)$ is nth derivative). If fact, half of the cases you listed would be spoken wrong, but I think those examples probably come up less than 1% of the time.

I'll be the first to admit that I don't have statistics to back up my claims. It would be great to go through a dozen textbooks and do counts, but I don't think that anyone has the time/stamina to do that. At best, we could find the number of msubs and ones of those that match a set of special cases that are potentially spoken wrong (they would have to be examined to determine that). But even without looking at those cases, it would give rough lower bound on how many are spoken wrong (especially if we included all the cases you found earlier, not just the ones listed above which I think you edited for brevity).

brucemiller commented 1 year ago

My first comment regards the "default default", ie. what defaulting rule set (if any) applies when there is no intent-default (although possibly explicit intent): the "legacy" case. I think the specification should not require superscripts to be treated as powers. I'd probably prefer the structure rule set as a default-default, but not common. However, I could live with the behavior in your current description: the AT is free to do as it wants.

brucemiller commented 1 year ago

My second comment is that I believe we should express the defaulting rule sets in terms of intent, rather than free text. For example, in the common set, msup has intent="power". This will have several benefits:

And the fact that it would force us to put some non-semantic items (eg superscript) in the core dictionary can be seen as a benefit. It would allow authors to override bad assumptions made by the defaults (eg common) on specific sub-expressions.

polx commented 1 year ago

are there defaults so that authors know what AT will do without intent or should authors always use intent if they care about how something is spoken.

I think we agreed that this is likely a too big entreprise: Any default set we can recommend will be frustrating. Leave this in the field of brave experimenting implementors.

Deyan: We should remove: A limited set of examples will be misleading to spec adopters.

I agree. Let us not promise such a set of rules.

I strongly suggest to start with a vocabulary clarification first:

The easy bits in there are the function-names at the bottom of David F's list (map word to intents then to pronunciation) and the unicode characters (map character codes to intents and pronunciation possibly differing from Unicode). Both imply translation as well.

dginev commented 1 year ago

@NSoiffer the original reason to open this issue, the way I remember it, was more specific than the general topic of "default rule sets for intent", which may fit better in a new dedicated issue (especially if we are closer to consensus to add some recommended markup).

To summarize my comments from the discussion in today's meeting:

So a first suggestion would be:

<math intent=":default-common">...</math>
<math intent=":default-structure">...</math>
<math intent=":default-legacy">...</math> <!-- identical to <math>...</math> -->

Separately, @davidfarmer expressed his hope that we won't have a prolonged discussion on what exact behaviors go into the "common" defaults. I certainly understand the sentiment. However, unless we specify a clear and fixed set of rules, it is reasonable to expect that different AT systems will implement different behaviors. Maybe that is an acceptable outcome, but we should be aware that we are making that choice.

Moreover, it is good if we can now bundle this discussion together with discussing properties, as I can make this "AT alignment" point in general:

Unless we clearly enumerate the exact effects each "behavioral property" is expected to enrich, we will have differences in behavior/coverage between AT systems. As an example, :chemical-formula may lead to AT system 1 to understand <mi>C</mi> as Carbon, while AT system 2 will (wrongly) understand it as Celsius, while AT system 3 will treat it as self-voicing/unrecognized (as it wouldn't be "common enough" for it).

To bring this example back to defaults, if common is left unspecified then AT system 1 may treat <mo>→</mo> as maps-to (mathematics), AT system 2 may treat it as yields (chemistry), while AT system 3 may treat it as unknown and self-voicing, as it wouldn't consider it "common enough".

Neil's "proposed defaults" are a start, but they are incomplete from a Western K12/K14 education standpoint. Should we try to make them complete?

davidcarlisle commented 1 year ago

I'd be tempted to drop legacy given that there has been no cross browser mathml prior to this year. I don't see a large corpus of documents with a usably definable legacy behaviour. Existing content is almost always in closed systems that can work as before, or guarded by javascript such as mathjax.

Commiting forever to an "undefined default behaviour" and forcing opt-in to get an defined behaviour seems a high price to pay to get unchanged behaviour for a possibly non existing set of documents. If you are using chrome there are no old documents with an existing mathml behaviour.

dginev commented 1 year ago

Adding a couple of my comments from the meeting on May 25th: