w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
62 stars 18 forks source link

Spec language for what goes in core #470

Closed NSoiffer closed 1 year ago

NSoiffer commented 1 year ago

At the last WG meeting, we agreed to create an issue so what we can resolve differences as to what the spec says about what belongs in core and what doesn't. Here's what it currently written:

Core: This is a list of core concept names curated by the Math Working Group. This list includes common concepts such as "divide", "power", and "greater-than". AT reading MathML attributed with a name in this list SHOULD consider this name to be a hint how the content could be read. However, because common notations have many specialized ways of being spoken (e.g., for division, one might say "three quarters", "x over three", or "3 meters per second" depending one the contents of <mfrac intent="divide($num,$denom)>), AT is not constrained to use the name given. Depending upon the reader, AT may add words or sounds to make the speech clearer to the listener. For example, for someone who can not see the a fraction, AT might say "fraction x over three end fraction" so the listener knows exactly what is part of the fraction. For someone who can see the content, these extra words can be a distraction. AT should always produce speech that is appropriate to the community they serve.

dginev commented 1 year ago

Based on my current "soft preferences":

- a well-defined scope of the Core list, following an organizational principle. - preference for a larger Core list, which offers a lot of clarity for its domain of annotation. - This enables each annotator/generator to quickly know the expected intent values by AT. (E.g. should I mark $x^n$ with "power" or "exponentiation", or even mark the scripted argument with "exponent" ?) - a small extension recognizing we will also have Core "properties" - clearly stated rules for curation - common concepts which are involved in ambiguous expressions should be recorded in Core - common concepts with common+custom AT handling should be recorded in Core. I am completely OK with the group agreeing on different concrete rules and principles for the list, as long as we have a consistent set of rules and principles (I believe this is a point of agreement between me and Bruce).

Spec language suggestion:

(I have added line breaks for readability during the discussion period.) > Core: This is a list of concept names, initially drawn from K12 STEM education. > The entries include common concepts such as "divide", "power", and "greater-than". > The list is curated by the Math Working Group. > > Certain readouts benefit from annotating the kind of mathematical object, rather than the concrete object. > For such cases, one should use an Intent "property" instead of an Intent "concept". > The Core list also contains common properties. > > AT reading MathML attributed with a concept (or property) in this list > SHOULD consider this concept (or property) to be a hint how the content could be read. > > However, because common notations have many specialized ways of being spoken, > AT is NOT constrained to use the name given. > - For example, AT may vocalize "three quarters", "three over x" (for ``), > or "three divided by x plus y" (for `/`), > depending on the contents and carrier element associated with an `intent="divide($num,$denom)"`. > > Depending upon the listener, AT may add words or sounds to make the speech clearer for their needs. > - For example, for someone who can not see a fraction, AT might say "fraction three over x end fraction", > so that the listener knows exactly which content is part of the fraction. > - For someone who can see the content, these extra words can be a distraction. > AT should always produce speech that is appropriate to the community they serve.

A possibly separate note, which may be attached to the list itself rather than the spec, should be guidelines for curating and naming entries. Expand for my first attempt at that:

> > ### Guidelines for Core list curation > > 1. When standard notations can be used to denote multiple common concepts, those concepts should be added to the Core list. > - For example, two vertical bars can surround an argument to mean "absolute-value", "cardinality", or "determinant" in K12 materials. > > 2. When a common concept has known special requirements for accessible readouts, it should be added to the Core list. > - For example, "power" and "divide" have known special handling in AT, based on the values of their arguments. > > 3. The initial scope extends to materials in K12 STEM education. > - While mathematics is naturally the main focus, all other STEM fields are also in scope, namely biology, chemistry, computer science, earth sciences, economics, engineering and physics. > - Concepts beyond K12 lie outside of the initial Core list. For example, when two vertical bars surround an argument to mean a group-theoretic "order", that Open concept will not be included in Core, unless the overall Core scope is increased to K14. > > 4. Naming. Each Core list concept is recorded via its English encyclopedic name. In cases of multiple known names, we strive to make a practical choice. > - For example, we would prefer "power" to "exponentiation", although that creates a tension with the use of "power" in physics. That choice is motivated by "power" being the more common name in present-day communication, as well as by mathematical uses of "power" being more widespread than the physics concept. > - Similarly, we may prefer "gcd" to "greatest-common-divisor" due to brevity, while still adding an informal note clearly stating the connection between the two. > > 5. The conditions enumerated here also extend to adding property names for kinds of objects. > - For example, marking an `` holding multiple equations with `intent=":system-of-equations"`, > marking a unit expression, such as meters-per-second, with `intent=":unit"`, > or marking the water molecule `H2O` with `intent=":chemical-formula"`.
brucemiller commented 1 year ago

The section @NSoiffer quoted from the current spec says a number of things, but as far as answering the question of what concepts should be included in the core list, all it says is "common". There's no mention of whether a concept needs special speech treatment, whether it can be spoken as-is, or whether it appears in ambiguous notations.

While those extra criteria could likely limit the size of the core list, a simple, single criterion of "common" is certainly much easier to understand (and maybe follow?). Personally, I'm pretty agnostic on the question, so long as we're all on the same page.

So is "common concepts" all we need? (We might clarify what we mean by "common", however.)

davidcarlisle commented 1 year ago

@brucemiller

So is "common concepts" all we need?

Basially, yes. Like the concepts given element names in content mathml or the list of Unicode slots in the Operator dictionary, or the list of characters given Unicode alphanumeric codepoints, the list is essentially arbitrary. The important thing is to have a list, to give implementers something to implement.

Trying to justify any list as "all, or most of K12 concepts" will just lead to endless debate about what concepts meet that criteria.

In practice the criteria might be "intent values with rules implemented in two or more systems by the time we reach CR and need to show implementations" but we don't have to describe it that way in the spec.

brucemiller commented 1 year ago

And yet, not having an explicit criteria also leads to endless debate, it seems. Indeed, it's difficult to contribute constructively to either the list or discussion without some idea of why a concept would be on the list, or even why a list is needed.

Giving "implementers something to implement" was, as I understood it, the motivation for adding only things that needed special treatment (since everything else basically implements itself). Does it really matter if some folks say "less than" and others "smaller than"?

OTOH, if the objective is just to have some convincing looking list, but the contents are arbitrary, isn't a better strategy just not to ask? :>

NSoiffer commented 1 year ago

I really like that we don't state that the core list is comprehensive and that we say it is "curated" which implies we've used our experience to choose the values. I think it also implies it is something that isn't fixed. I think it is best to add a little about the decision process. Following @davidcarlisle's lead, perhaps we can extend @dginev's sentence

The list is curated by the Math Working Group.

to "The list is curated by the Math Working Group based on experience with different AT implementations and following the guidelines set out in [xxx note].

Overall, I like what @dginev wrote, although I have a few suggested changes besides the one above:

Certain readouts benefit from annotating the kind of mathematical object, rather than the concrete object.

"Some readings benefit from annotating the kind of mathematical object, rather than giving an explicit concept name to be spoken."

AT reading MathML attributed with a concept (or property) in this list SHOULD consider this concept (or property) to be a hint how the content could be read.

"... AT MAY use this concept as a hint to improve braille generation."

NSoiffer commented 1 year ago

Are others ok with these changes? If so, can we get them into the spec before the meeting so we can discuss them?

Also, can "we" (@dginev or @davidcarlisle) get a draft note started based on @dginev's "Guidelines for Core list curation". As a draft note, it will be in some place that is easily referred to and potentially we can move it to be an "official" note at some point.

davidcarlisle commented 1 year ago

putting them in the spec before discussion seems backwards, I'll make a pr in my fork so it can be viewed at the meeting

davidcarlisle commented 1 year ago

I added @dginev 's text as amended by @NSoiffer apart from the line saying properties are in the same list (which would take more work as currently they are separate, and introduced later in the spec.

diff

https://github.com/w3c/mathml/compare/main...davidcarlisle:mathml:main

rendered view

https://davidcarlisle.github.io/mathml/#mixing_intent_dictionaries

@dginev 's text exactly used to initialise a new note

https://w3c.github.io/mathml-docs/concept-lists/

NSoiffer commented 1 year ago

@davidcarlisle: thanks. That a much better way to do this.

dginev commented 1 year ago

This should be close-able in light of MathWG discussions which led to #471 and https://github.com/w3c/mathml-docs/commit/34827623d5c6a76c968f0b164527472070a1c2e1 . Letting someone else hit the "close" button just in case I misread the resolution.