w3c / mathml

MathML4 editors draft
https://w3c.github.io/mathml/
Other
61 stars 18 forks source link

Path for adding semantics #84

Closed davidfarmer closed 1 year ago

davidfarmer commented 5 years ago

This is related to issue #64 but is more about the pipeline from source to MathML in the browser.

I believe it is possible to have many authors use a standard set of semantic LaTeX macros. Well, maybe not most authors, but a significant subset of those who write in PreTeXt (https://pretextbook.org/) . This includes some popular textbooks.

For example, they would use \abs{x} for the absolute value of x, instead of writing |x|.

There would be a repository which contains information about that macro. For example, "in LaTeX the macro has the definition \newcommand{\abs}[1]{|#1|} "

But more significantly (this is the point of this issue) the repository also has information such as:

Pronounce this as: "absolute value of #1" or "begin absolute value #1 end absolute value" or whatever is the right thing to say, possibly with variations.

Write this in braille as: [whatever it should be]

What I would like to see happen is: author writes their source using the standard macros. Once converted to HTML (say, using MathJax to convert the math), a screen reader makes the correct pronunciation of $|x|$ without the need for any guessing about what "vertical line x vertical line" means. The key to making this happen is the repository associated with the macro.

Is it reasonable to hope that things will be able to work that way? I am pretty sure that authors can be persuaded to write their books with standard macros, since it is not really any extra work, and the benefits would be significant.

bkardell commented 5 years ago

I'm still a little new on this so I apologize if this seems remedial or something but let me ask anyways since it is inline with some things I want to ask in the next meeting...

Once converted to HTML (say, using MathJax to convert the math)

This seems to imply a number of kind of key things -- MathJax, as you say, ultimately converts to HTML. Lots of things could potentially create 'mathy' HTML that involves no actual <math> element. Part of what we are doing is trying to describe the underlying 'plumbing' of MathML and the platform so that we solve both ends of that problem.

So... It seems then what to some extent you are desiring "new magic" here that isn't about MathML as much as the platform capabilities/architectures themselves? That's not a statement as much as a question trying to help me understand: Where does this role play in specifically? If there was no MathML, where would this 'fit'?

davidfarmer commented 5 years ago

I also am new to these discussions, and I think I have the same question: how does this capability fit with MathML, MathJax and other technology?

I'll try to answer the specific question: what if there were no MathML?

Whether or not there was MathML, I could write Javascript that took (well-written, semantic) math content on the web page, looked up pronunciation and other information about those macros (look up in the repository I postulated), assembled that information and inserted it on the web page in some hidden div, and then give people a way to access that pronunciation information (say, a "pronounce the math" button).

This could be done in a way that does not disturb however that math was going to be rendered visually (whether MathML or otherwise).

And that would be a terrible solution to the problem I have described, because it would be some home-grown hacky solution that could not be widely adopted because it does not mesh with the established technology. The "pronounce the math" button would not be good for someone using a screen reader, for example.

On the other hand, if the 'hidden div' mentioned above were replaced by some official way to incorporate that information into the MathML which is used to render the math, then it could be useful to a screen reader.

Whether it is MathJax that converts the LaTeX to MathML in the browser, or that conversion is done offline before sending the HTML, I don't think is relevant to my question.

I want to make this happen, and my guess is that doing it in the MathML is the best option for having it work properly and be widely useful. So I want to know if that is likely to be the case, or if I should start thinking about other options.

On Thu, 18 Apr 2019, Brian Kardell wrote:

I'm still a little new on this so I apologize if this seems remedial or something but let me ask anyways since it is inline with some things I want to ask in the next meeting...

  Once converted to HTML (say, using MathJax to convert the math)

This seems to imply a number of kind of key things -- MathJax, as you say, ultimately converts to HTML. Lots of things could potentially create 'mathy' HTML that involves no actual element. Part of what we are doing is trying to describe the underlying 'plumbing' of MathML and the platform so that we solve both ends of that problem.

So... It seems then what to some extent you are desiring "new magic" here that isn't about MathML as much as the platform capabilities/architectures themselves? That's not a statement as much as a question trying to help me understand: Where does this role play in specifically? If there was no MathML, where would this 'fit'?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.[AABTULAC4527KQBCCGUGYA3PRCQZPANCNFSM4HCLOO4Q.gif]

davidcarlisle commented 5 years ago

I think that this is essentially a duplicate of #64.

Of course if we define a system of roles, there has to be a pipeline from some authoring enviornment (including conversions from latex) that adds the roles in the right places, but in general the mathml spec can't say a lot about how the mathml is created, so perhaps a TeX fragment of \abs(x) ends up as <mrow role="absolute value"><mo>|</mo><mi>x</mi><mo>|</mo></mrow> or whatever the final specification of role= syntax ends up being, but mathml can only specify the role (and its effect on screen readers) not really the conversion from TeX.

bkardell commented 5 years ago

Sorry if I am only adding confusion and not help here - I'm not sure exactly... but.. are there efforts in aria to define said roles we would need? if so, is it worth linking up here?

fred-wang commented 5 years ago

@davidfarmer I think Neil or others can better comment on the status and plan to add semantic information. As @davidcarlisle noted, there are discussions to add more semantic via roles. However, I think your comment is interesting and actually goes further. In any case, I think this discussion has a place in this CG and probably the ARIA WG should be involved too.

Quoting Abraham Nemeth from the book "Braille into the next millenium":

The Principle of Meaning Versus Notation: In my view, it is the transcriber's function to supply only notation, not meaning in an accessible form (speech or braille). It is the reader's function to extract the meaning from the notation the transcriber supplies. Consider the common notation (x,y). That notation can mean many things: the ordered pair whose first component is x and second component is y ; the point in the cartesian coordinate with abscissa x and ordinate y; the open interval on the real line with left endpoint x and right endpoint y; or the greatest common divisor of x and y. The transcriber's function, however, is only to convey this five-symbol expression to the reader. It is the reader's function to extract whatever meaning his experience and the context of the text permit.

Many people disagree with Abraham Nemeth here and want to add information to describe the different meanings (pair, point, interval, gcd...) in order to make formulas less ambiguous and "readable". However, math notations are really open-ended and it's quite common for authors to introduce their own very field-specialized notations at the beginning of papers. As I see, your proposal has the advantage to let authors explicitly provide the way they want the notations to be read, instead of relying on a fixed set of known definitions (if I understand #64 correctly).

One possible limitation though: Are people really going to define macros for pair, point, interval, gcd etc? I suspect authors would just write $(x, y)$. To take something even more basic, consider exponents: In calculus, $\sin^n$ and $\sin^{(n)}$ have different meaning. In set theory, $\mathbb N = \omega = \aleph_0$, but $2^{\mathbb N}$ is a set of functions, $2^\omega = \omega$ (ordinal exponentiation), and $2^{\aleph_0} > \aleph0$ (cardinal exponentiation). And obviously, there are tons of other notations with superscripts and subscripts. If introducing macros does not make thing more concise, I believe people would just write ^ and .

Anyway, this is just my two cents on this... I think I would be happy if we had at least a standard and cross-platform way to read presentation MathML that addresses Nemeth's minimal need. Currently we even don't have this minimal support AFAIK.

davidfarmer commented 5 years ago

I am hopeful of implementing these ideas for the math that is commonly taught up through the first or second year of college.

The ingredients are:

1) A modified form of LaTeX, which is meant to be human-readable, human-writable, and semantic. (Example below.)

2) A script that converts 1) to another form of LaTeX which is not intended to be written or read by humans, but which preserves the meaning of the original source. This form uses what below are called "semantic macros".

3) Explicit rules for how to pronounce the semantic macros. (There are multiple options for each macro, ranging from verbose and extremely precise, to brief. People familiar with the subject want the brief version. Experts provide these rules. The examples below (made up by me) are not good, but I hope they illustrate the point.)

4) LaTeX definitions of the semantic macros, which describe the visual appearance of the output. (Individual users are free to change these definitions. You want \transpose{A} to put the "T" on the left instead of the right? No problem: just change the macro.)

5) What is missing, and I think it is the point of this issue, is whether all this information can be accommodated in MathML. The alternative is that MathML just shows the appearance, and pronunciation is handled in another way.

Note that this is about pronunciation, not Braille, We could also make a version of the semantic macros that outputs Braille.

The context in which I am sure I can accomplish 1)-4) is open source textbooks, particularly those written in PreTeXt. If the author has been reasonably consistent, then it is possible to write a throw-away script that converts their source to a structured form. There are not many good open source math textbooks, and I could imagine myself converting all of them to semantic form. Those books would reach many students.

1) Here is an example of structured LaTeX markup:

If $f:\R \to \R$ is (strictly) decreasing then $$ \sum_{n=1}^A f(n) \ge \int_1^{A+1} f(x) dx $$ and if $x \in interval[-10, -3]$ then $f(\abs{x}) < f(x)$.

Differences you might notice from typical LaTeX: the \interval tag to indicate that [-10,-3] represents an interval, the (unambiguous!) macro \abs{x} instead of |x|, the macro \R for the real numbers, and good use of white space for clarity.

2) Here is the same text, now using semantic macros:

If $\functionDomainCodomain{f}{\reals}{\reals}$ is (strictly) decreasing then $$ \sumLimits{n=1}{A}{\functionApply{f}{n}} \ge \definiteIntegralLimits{1}{A+1}{\functionApply{f}{x}}{x} $$ and if $x \in \intervalCC{\minus{10}}{\minus{3}}$ then $\functionApply{f}{\absoluteValue{x}} < functionApply{f}{x}$

3) Here is the pronunciation of those semantic macros (note that I am new to this area, and surely an expert could improve these. I am just trying to convey the idea. Also, there needs to be multiple pronunciations, from verbose to concise. Examples below are verbose.)

\functionDomainCodomain[3] function #1 from #2 to #3 \reals the real numbers \sumLimits[3] the sum from #1 to #2 of #3 \functionApply[2] #1 of arg #2 end arg \definiteIntegralLimits[4] the integral from #1 to #2 of #3 dee #4 \intervalCC[2] interval including #1 to including #2 \minus[1] negative #1 \absoluteValue[1] absolute value of arg #1 end arg

Unfolding all of those pronunciations gives the pronunciation of the expression. My main question: can such a pronunciation be obtained from attributes on the MathML?

An enhancement I plan is a "simple argument" version of \functionApply: If the argument is "x" then the arg ... end arg is not necessary. The script which converts structured LaTeX to semantic LaTeX can determine which one is appropriate.

4) The LaTeX definitions of the semantic macros are what are used to produce the visual output, either by MathJax or by some other program that converts to MathML (or whatever other method is used to make the visual display).

One example of the semantic macros:

\definiteIntegralLimits[4] \int_{#1}^{#2} #3 \measureD #4

\measureD \,d

Note that those macros address the LaTeX shortcoming that authors need to micromanage the layout. In the source, the "\int" and the "dx" are recognized as parts of one object, which is then interpreted appropriately.

The publisher is free to redefine the semantic macros. For example, if you want the "d" of "dx" to be upright, then change the definition of \measureD. The semantic input, and the pronunciation, will not change.

Another example is \functionApply. I will not give the definition (which I learned from Alex Jordan), but it addresses the issue that $f\left(\frac12\right)$ does not look good, because there is too much space after the "f".

The above is supposed to illustrate the possibility of writing semantic source, and not losing that information along the pathway to displaying that material in the browser, enabling screen readers to pronounce that material without any heuristics or guesswork.

fred-wang commented 5 years ago

Thanks for the detailed explanation. I think restricting to a subset of math taught at college + asking authors to always use explicit macros addressed my concerns. For the missing bit, I personally don't know if there is any standard way to tell screen readers how to pronounce the text so I'll let others reply.

NSoiffer commented 5 years ago

Sorry for the slow response -- I'm catching up after a month long (great) vacation... Also, this is a very long comment that has taken a while to write

For me, a goal of MathML 4 is to allow, but not require, some semantic enrichment of presentation MathML. The primary use case I see is for accessibility, but I'm sure that are others such as computation. I like @davidfarmer's ideas for using more semantic LaTeX and it fits in well with my goal of finding a way of putting that into the MathML. We might want to allow for explicit text or braille, but doing so has a number of drawbacks:

IMHO, providing explicit text should be done only in exceptional cases (e.g, a test where you don't want to hint at answer). Instead, it would be better to embed the meaning and let the renderer choose the proper speech based on user preferences. This is similar to saying the visual renderer shouldn't use an image but instead should render the math to match the font size (etc) that the user has chosen. MathML provided a fallback altimg for math renderers that couldn't render MathML, but that never got implemented by browsers so was rarely added to the MathML. MathML also allows alttext on the math element. Potentially this could be expanded to be allowed on all MathML elements. However, that means there is only one allowed text; no alternative ways to speak the math, no other languages allowed.

Note that some of these problems exist for braille also. Nemeth was the standard for braille in the US and some other countries for many years. Recently, Unified English Braille (UEB) has come along which defines its own math codes. Most English speaking countries have adopted UEB; the US allows both. It's a very contentious topic because UEB uses many more characters than Nemeth to encode math. If the braille is author generated, both would need to be included in the MathML somehow. Because braille is syntax-based, I believe it can be generated from presentation MathML for both codes. I wouldn't be surprised if there were one or two problems areas that should be fixed for good braille generation from MathML, but I haven't heard of them yet. MathPlayer uses liblouis for its MathML to Nemeth conversion and although there are bugs in the conversion, AFAIK, none are due to a problem with MathML. So far, MathML to UEB in liblouis is very incomplete.

What I like more (or in addition to text) is providing a way to embed semantics. Some options are:

As mentioned in #64, we should talk to the ARIA WG after we have some proposal or set of questions to ask them. I hope I'm correctly representing ARIA in the following simplified description of what it is...

When a web page is read, the DOM is created and from the DOM, and simplified view of the DOM called the accessibility tree is created. HTML elements map to various things in the accessibility tree; ARIA provides a way to override those mappings. Those are particularly useful for divs and spans since they don't have mappings to the accessibilty tree. Perhaps the most important mapping is to the accessible name of an element in the accessible tree. Screen readers typically use that name (which might be the concatination of various other names of DOM elements) as the text that is spoken. The names are plain strings.

If we are thinking of having screen readers directly read MathML, it would be by providing a means for MathML elements to set the accessible names in the accessibility tree. In some sense, that's already possible because you can add aria-label to an element and that will override/set any text for the name. Just as I'm not a fan of authors setting the text, I'm also not a fan of this although potentially the text could be generated by client software based on user preferences. Math has it's own braille code, so braille is lost. There is a suggestion to add a new aria feature aria-label-braille that would provide braille, so that would remedy that problem. Synchronized highlighting remains a problem though. Perhaps we should be thinking about what to add to ARIA to solve that?

Alternatively, maybe we should be looking at other ways to allow screen readers, or more likely, third party libraries that screen readers can call such as Volker Sorge's SRE, to produce speech better. Enchancing ARIA's role is one possibility, but to me, adding a bunch of math related things "pollutes" that attribute. As I mention earlier, mrole or mathrole on MathML elements makes more sense to me.

Whatever we come up (potentially more than one idea), once we have discussed it and come to some conclusions, we should bring those to the ARIA WG for their feedback.

Finally, I want to digress a little and explain why Dr. Nemeth wanted syntax, not semantics when he heard speech. I had the privilege of meeting him when he was 92. Although blind (and 92), he traveled by himself via plane to a workshop I was at. He was as sharp as anyone in their 20s and had a keen sense of humor also. However, the technology he used was not modern. As he explained, he developed MathSpeak as a one-to-one way of speaking his Nemeth braille code (which encodes syntax, just like sighted math notation). There were two reasons for his design of MathSpeak:

  1. he couldn't trust his readers as they were not necessarily math-literate, and
  2. having it in braille made it much easier for him think about and remember what was said. He used a braille writer for that so if the person spoke MathSpeak to him, most every word corresponded to a braille symbol. This made it easy to type what he heard. Back then, and until a few years ago, there was no option of math automatically showing up on a refreshable braille display, so he needed to type what he heard. At the workshop, he said that he would still type it out even if he had a refreshable display because it helped him learn it. I'm dubious that's a good path for most students who are blind that are trying to learn math these days. It would definitely slow them down.

Most work on speech has been trying to have math speak like someone would normally speak it -- i.e, have it spoken semantically. I tried to find a study that compared syntactic speech versus semantic speech but couldn't find any. It would be good to validate whether all the work being done for semantic speech actually does aid understanding. Anecdotally, several students have said they want math to be read the same way their teacher reads it, so the reason for generating semantic speech isn't completely made up.

Apologies for the extremely long comment. There was a lot to respond to and to explain.

fred-wang commented 5 years ago

Thanks for the detailed reply, @NSoiffer.

Just to add some comment regarding how browser pass info to assistive technologies:

Of course whatever the method used by browsers to pass the information, work is still needed on the author side and on the assistive technology side to get good rendering of the math...

Should we close this issue and mark it as a duplicate of #64 ?

dginev commented 2 years ago

Should we close this issue and mark it as a duplicate of https://github.com/w3c/mathml/issues/64 ?

I think I want to second Fred's proposal, given the 3 year silence here.

NSoiffer commented 1 year ago

Solved by using intent.