w3c / wpub

W3C Web Publications
https://w3c.github.io/wpub/
Other
78 stars 19 forks source link

MathML and WPUB #72

Closed rkwright closed 6 years ago

rkwright commented 7 years ago

An open question for the WPUB WG is the relationship of MathML to the developing specification.

MathML is a complex specification, comprising both a semantic representation and a presentational form. The two are often conflated, leading to a lot of confused discussion. Further, MathML is often used in the creating of digital publications, but is not infrequently replaced by some other representation (e.g. SVG, LaTex, images) for the final publication output. Only in a few cases is even the presentational markup part of the document that is rendered by the User Agent. There, some browsers support rendering in the presentation (more or less, though the fidelity is generally poor). More often, a polyfill such as MathJax is used by the UA.

Given the above, the following questions were posed on the WPUB mailing list:

The following comment comprises all of the responses that were posted to the mailing list. All effort has been made to attribute the comments correctly. Any errors are my own.

BillKasdorf commented 7 years ago

Your summary paragraph is excellent, Ric. I'm confused by the three questions, though: weren't those the questions you asked in the first place, for which your summary is the response?

For the record, on the third bullet, yes I think you got ample response that MathML is used extensively in STM and educational publishing: both sectors consider it an essential technology and depend on it. (That is, Presentational MathML.)

I would like to hear directly from accessibility experts about its use in assistive technology. I have always been told that it is what AT wants to get (imperfect as it may be) and that it not only is used, but that it saves an enormous amount of work, cost, and time when MathML is available vs. an image of an equation.

rkwright commented 7 years ago

Bill Kasdorf, 6 Sept 2017, 1438h I’ll reply from the point of view of a major conversion and prepress vendor, Apex, which is the company I work for. And this is true of all of the other main prepress and conversion vendors that I know of. WRT: What is current publishing views on these questions? Are the major publishers and/or journals actually using MathML? It is not easy to find examples of the use of MathML (and most are very old) which suggests that it is not widely used. Yipes!

What I think you’re saying is “it is not easy to find examples of the use of MathML in EPUBs.” That is true.

We create millions of equations as MathML, and so do our peers. It is fundamental to how we do math. To cite just one journal (admittedly the world’s largest, PLOS), the first thing we do for every equation is to convert it to MathML, no matter what format it comes to us in. All the uses of that math throughout the workflow, including images, are created from the MathML, and it is embedded in the XML that is the basis for the entire workflow. In the case of PLOS, as with many STM journals, that’s JATS XML (almost all STM journals deliver JATS XML, though not all use it as the workflow format), but we also deliver all the content as HTML, and that includes equations. (I can’t swear that the MathML makes its way into the HTML, but even if only images do, they were created from MathML.) That amounts to thousands of articles per month published on a daily basis, a firehose of content. That’s just one journal. The same thing is true of all book and journal content we create. All. All MathML. University of Toronto Press, for pete’s sake? All MathML.

Virtually all math in scholarly publications now exists, at some point in the workflow, as MathML. The problem is that it almost never gets into EPUBs because the reading systems mess it up. (Plus almost no journals deliver articles or issues as EPUBs, though that is about to change. In fact Atypon, one of the world’s leading hosting platforms for STM content, now owned by Wiley, will do that in their upcoming release. And it’s all based on Readium.) We do what our customers tell us to do. So when they tell us not to put the MathML into the EPUBs, we don’t. But we always have it.

I interviewed other prepress/conversion vendors and publishers for an article I recently wrote. Same story. They all have MathML for all their equations. Always. The MathML just doesn’t get into the EPUBs.

So please help make the world safe for MathML in EPUBs!!!

Correction: We supply XML, which includes the MathML, and the images of the equations created from the MathML, to PLOS and they convert it to HTML for online. They are working on implementing MathJax at PLOS but don’t currently use it live.


Liam Quin, 6 Sept 2017 1555h

On Wed, 2017-09-06 at 13:55 -0500, Ric Wright wrote: [...]

My own opinion is that the most effective way forward is to drop demands of every-little-bit-of-mathml-everywhere and instead for publishers to demand support of specific mathematics rendering features via CSS. For example, a CSS way to make built-up brackets (fences). I don't see e.g. building full TeX support into Web browsers any time soon, although with asm.js maybe it'll happen. But TeX was designed for print, and making TeX mathematics accessible turns out to have its own challenges too.

Yes.

or just throw in the towel

No.

Consider someone using readium to supply an epub3 textbook in the educational sector where accessibility is required by legislation as well as institutional rules.

[...] It is not easy to find examples of the use of MathML (and most are very old) which suggests that it is not widely used. It's not widely shipped on the Web because browser support is so pathetic. That doesn't mean it isn't used. But browsers don't see it as a high priority I think primarily because not enough people ask them for it.

It's a bit like vertical Japanese text - publishers simply weren't putting it on the Web because the browser support was inadequate. When the browser vendors became aware of this there was an increased push for supporting vertical Japanese text.

Similarly Opera didn't support XSLT until Google Maps was released requiring it.

The best way to improve browser support for MathML would be to give them a business case for it. Bill Kasdorf's response is an example. If scientific journals started publishing with native MathML and just said, if your browser doesn't support enough of MathML here's a list of ones that do, would that make a difference? I expect it might.

It might also help to be clear that MathML isn't only for research- level mathematics and engineering but applies from elementary school upwards. Or there's the sneaky approach of making a "profile" of MathML called K12ML or even just MiddleSchoolMathMarkup (MSMM) and pushing that... of course, it wouldn't leave much out!

But all this is about a sort of activism & doesn't answer the question is is yes, I'd want to detect whether a book uses mathematics (e.g. via a simple XPath query) and include a copy of MathML conditionally, activate it only when needed. And include a message, "Loading JavaScript support shim for missing browser support for equations" :-)


Liam Quin, 6 Sept 2017 1655h

PS: see https://bugs.chromium.org/p/chromium/issues/detail?id=6606 [[ We (layout team and igalia) have a tentative plan to build a MathML implementation on top of LayoutNG and Custom Layout. I've marked this as blocked on CSS Custom Layout and LayoutNG. ]]

-- Liam Quin, W3C, http://www.w3.org/People/Quin/ Staff contact for Verifiable Claims WG, XQuery WG

Web slave for http://www.fromoldbooks.org/


Ric Wright, 11 Sept 2017, 1855h

@liamquin Thanks for the feedback. I hope they can improve it, but from Readium¹s own experience with Chrome bugs, I am not holding my breath. :-)


Ric Wright, 7 Sept 2017, 0822h

Thanks for the feedback. This is very useful and educational. I, for one, am in favor of keeping MathML as a first-class citizen in Readium, including R2. I look forward to more widespread use of MathML in EPUBs. When it works, it looks very good. Still worried about accessibility, but there doesn’t appear a good solution there.

I simply can’t imagine how voice-over or JAWS or any other assistive tech would voice

heateq

But perhaps I need more education.


Ric Wright 7 Sept 2017 0832

@liamquin I agree with most of this. But IMO there is a large divide between vertical Japanese text and MathML. One is pretty much a requirement in Japan, while MathML is somewhat of a niche language. Still, there is hope. As one of the original Adobe engineers on SVG we despaired at times of it ever succeeding, especially when Adobe killed of its own support (except import/export in Illustrator, which I wrote most of). But mirabile dictu, the browsers DID eventually implement most of the 1.1 spec!

So there is still hope for MathML.


Liam Quin, 7 Sept 2017, 0832h

You seriously went all the way through school without having an equation in a text book? :)

(rkwright: Yes, but in my head I don't say "Integral symbol then Backwards 6 divided by ... then upside down triangle, etc. My brain knows what the math means, but how does it get voiced?)

But yes, I agree there's similarity with SVG. And CSS took a long time to catch on too, even though the use of generic styles for markup was an integral part of SGML itself.


Rachel Comerford, 7 Sept 1124h

MathML is definitely not a niche language in educational publishing. We have tens millions of dollars of revenue tied to students being able to read our math, stats, science, econ, and psych texts (to name a few). Without MathML, our fall back is an SVG, but SVG doesn't have enough support, so then our fallback becomes png or jpg. Imagine trying to navigate your first math text in a digital environment to find that many inline equations aren't resizing with your text. Or worse, imagine reading that text with a screenreader and having to hear someone's complex alt text interpretation of the equation rather than a standard language interpretation.

For better or worse, educational publishing is deeply dependent on MathML even as we provide image fallbacks to MathML in our epubs for those readers that do not support it...

If there is a better solution for accessible pedagogically appropriate display of math then I'm all ears, but until then, the treatment of MathML as niche could be devastating for students.


Daniel Bennet, 7 Septe 2017 1256h I have a couple questions that bother me about this issue. First was the issue of accessibility. It has seemed as there is no way to have a semantic version of Math and presentation together. Is there ever going to be a single solution for Math that has just one version for both as HTML text has had (especially in HTML5)? The second question, and as a strong supporter of XML, is it possible to semantically represent math with XML? For example, there is no real way to have page and line numbers in XML as well as paragraphs that span them, as this breaks nestedness. The fixes for this are problematic within XML. Is math intrinsically impossible to represent in XML? Or is it just so difficult there is no solution that can be both semantic and presentation? And is there an XSLT possible that can transform them in either direction?

Sorry if the answers are obvious, but I had not heard of them.


Liam Quin, 7 Sept 2017 1303h [...] The second question, and as a strong supporter of XML, is it possible to semantically represent math with XML? For example, there is no real way to have page and line numbers in XML as well as paragraphs that span them, as this breaks nestedness.

The usual approach involves thinking of page breaks as separations rather than containers and then using empty XML elements to represent them; the same for line divisions (except for poetry, where the lines are part of the content).

For rhetorical overlap with structure, such as a quotation that goes from the middle of one paragraph to the middle of the next, a representation of one structure or the other as primary and using attributes to link together e.g. a continued quotation, is a common approach.

The people at the Text Encoding Initiative and more generally Digital Humanities have been doing these things for decades, so it's a question of knowing where to look ;-) There've been papers on representing overlap in XML presented at Extreme Markup and, later, Balisage, conferences.


Daniel Bennet. 7 Sept 2017 1119h Hoping that someone will answer the questions I posed. This was non-responsive to my questions.


Bill McCoy, 7 Sept 2017 1119h Daniel, Daniel, it seems to me that the precursor question is: is it possible to semantically represent mathematics in a single general declarative data format? Whether that data format could reasonably be XML, JSON, or something else seems a second-order question as it would be moot if the answer to the first question is "no".

I would think of systems like Mathematica as embodying possible existence proofs but yet it seems that it is both leaning on programmatic scripting and (per threads like https://mathematica.stackexchange.com/questions/28162/alternatives-to-mathematica) not able to handle all the various disciplines that comprise "mathematics" (not to mention the various other fields that use applied mathematics).

In any case this seems like a knowledge representation question more so than a publishing question. E.g if it was possible to represent mathematics semantically it would be fodder for AI first and foremost, and only secondarily fodder for presentational publishing.

So I think that this group should probably punt on this problem as both being beyond its scope and likely insoluble.


Daniel Bennett, 7 Sept 1425h @whmccoy One could always show equations with graphics, especially SVG. Why bother with MathML at all then? I thought the idea was to have mathematics that could be represented as math. I have seen HTML go through similar bumps with texts. GIFs were the best way to show stylish text and alt attribute provided the actual text. Yet we have advanced to allow for embedded fonts and SVG to allow text to be better represented with great presentation. So again I ask whether MathML is up to the job of representing the math well? Perhaps this is behind the reticence to fully incorporate MathML into browsers.

I remember the first time I saw a PowerMac while visiting Apple. They showed me the included calculator that allowed for playing with polynomials, dragging and dropping, along with immediate graphing of the equations. Makes me wonder if MathML will make this type of work easier in web pages or not.


Liam Quinn, 7 Sept 2017 2017h Speaking as someone who used to maintain a version of eqn, the troff preprocessor whose language helped inspire TeX's mathematics, and as someone who thought about SGML representations of mathematics back in ISO 12083 days, ... :)

Mathematical language is a mix of nested and non-nested constructs. XML can represent everything you can do in TeX, at least in theory, but (as you implied) the transfer syntax can become unwieldy. However, I've yet to see a syntax for mathematics that isn't unwieldy sometimes. There are tools to generate MathML online, and to convert to it from LaTeX and other notations and back (although not arbitrary TeX, since that's a Turing-complete macro-programming language!).

So yes, you can represent mathematics with markup such as XML, at least up to grade 13 level (end of high school/first year of university). After that you start getting into areas of mathematics where people invent their own notations and there's no system on the planet that can represent that out of the box (obviously). So then you use the presentation markup, or subvert some other semantic markup, just as people do with TeX - e.g. "this isn't really a matrix, it's a partition diagram in algebraic topology" but that's OK - short of a theorem proving tool, there's not much software can do with such things: they communicate ideas between people.


Ivan Herman, 7 Sept 2017 1219h

Daniel, just adding my limited knowledge in the area… And forgive me if you already know that.

MathML is, actually, a dual specification. It has an XML representation referred to as Presentation Markup[1], and a parallel one called Content Markup[2]. As the name suggests, the first representation is down to the presentation, there are elements to create fractions, integrals, etc. The Content Markup tries to operate on a higher level, representing the underlying concept. I just copy here an example from [2] for the representation of a factorial n!:

Content MathML

n

Presentation MathML

n!

The Content Markup is clearly superior for accessibility. However, it would rely on a rendering engine that converts that into presentation mathml, taking also into account the fact that various cultures use slightly different typesetting rules for mathematics, for example.

This is the theory; I do not know whether content markup has ever been used and accepted as a good solution, and I also have not heard any mathml rendering engine starting with content markup.

AFAIK, there are also activities (but I do not know the details) starting with presentation markup and add annotations to the markup to reproduce the semantics. I know there are groups at, e.g., Benetech doing something like that, and produce some accessible version of mathematics, but I do not know the details (some others on this group probably know more about it).

The bottom line is that yes, this is is also a huge issue, adding to the complexity of Math on the Web…


Daniel Glazman, 8 Sept 2017 0145h First thing, I'd like to say my editor BlueGriffon has MathML editing support for all flavors of html and all flavors of EPUB.... I have customers editing tons of EPUB documents with MathML formulas inside.

About your first question: I have never really understood the need for both a semantic markup and a presentational markup for Maths. Since the early days of the EuroMATH DTD for SGML/CALS, the main focus has been favouring semantics over presentation.

The second question has a clearer answer: everything that can be semantically read by a human being can be semantically represented in XML. As someone else said, the only issue is about user-defined math elements but we have now an answer with Houdini that opens up the black box of CSS offering direct hooks onto the layout engine of browsers that implement it. All in all, I am under the very clear impression that XML (plus Houdini for user-defined constructs) is perfect for representation of Maths.

Last thing, I read a question about line numberings. That question goes faaaar beyond MathML.

  1. the CSS WG has been discussing for ages a pseudo-element able to select the nth-line of a given element. While we have been having ::first-line for decades now, that potential extension remains complicated and far below the browser vendors' radars.

  2. beyond the line numbering of elements that span over multiple lines, it's often the case that books number "empty lines" that we create through vertical margins between elements. We have no way of doing that now (that would be complex and the increment would depend on both the line-height of the block before the break and the one after the break...) and it's never been discussed.


Laurent Le Meur, 8 Sept 2017 0307h The fact is that MathML is an integral part of XML production workflows in STM. Bill K. makes it 100% clear. But is it mainly Content MathML or Presentation MathML?

In the first case (Content Markup), and with the limitations explained by Liam, the presence of MathML is EPUB content seems logical, because tools like MathJax will map the semantics (required in a production workflow) to a good visual presentation with a good accessibility (both required in a publication workflow).

In the second case (Presentation Markup), I still don't see the advantage of providing MathML over SVG (or HTML + CSS) at the EPUB (= publication) level.


Ivan Herman, 8 Sept 2017 0358h @llemeurfr Hm. I thought that MathJax maps the presentation markup and not the content markup… I am looking at the examples in, e.g.:

http://docs.mathjax.org/en/latest/start.html

In the second case (Presentation Markup), I still don't see the advantage of providing MathML over SVG (or HTML + CSS) at the EPUB (= publication) level.

Just to clarify your comment: you say that in the final EPUB file an SVG should/would be good enough for Math, right? I would agree with you except that if you want to add interaction to equations then (though feasible) it becomes probably a pretty complex SVG. Ie, if MathML could be displayed really quickly (ie, interactive speed) then attaching callbacks, colour changes, etc, would be way easier if it was scripted directly on the MathML level.


Bill Kasdorf, 8 Sept 2017 0719h It's Presentation MathML but MathJax now infers semantics from it that are useful for accessibility. Plus AT tools work with it. I've been told that getting MathML for a math-heavy textbook can save something like 85% of the cost of making it accessible.


Bill McCoy, 8 Sept 2017 1053h Daniel Bennett, you are absolutely right that this same tension between presentation and semantic content exists for text represented with HTML. But, a sequence of text is a relatively straightforward data structure. Mathematics (across all disciplines) - not so much.

Of course that PowerMac demo was of a calculator operating on its internal data model not software that was starting with a displayed graph of an equation and going backwards.

Personally I think the best we can hope for is that MathML will become more widely adopted because it provides for a much more adaptable/accessible/stylable representation of presentational mathematics (which will help to display content on various devices as well as for those with print disabilities). Serendipitously, this representation, being much more structured, will also be MUCH better grist for machine processing than SVG or a bitmap image would have been. So that live calculation UX for arbitrary MathML - a la the PowerMac demo of yore - could be feasible even if not 100% reliable for all possible MathML because there will still be some heuristics involved.

This is, to circle back, similar to the situation with text. People didn't move away from text blocks in GIF to enable better machine processing of web pages but to make web pages faster, more accessible, more responsive, better on mobile devices. But that evolution certainly is facilitating machine processing of web pages. Yet if we had tried to (re)design HTML to be the perfect data format for semantically representing texts that would likely have failed to gain adoption. And we can probably all agree this could have been feasible in principle, whereas in the case of arbitrary mathematics the feasibility is questionable.


Laurent Le Meur, 8 Sept 2017 1112h @Ivan, from the MathML doc, Content MathML support comes with an extension. http://docs.mathjax.org/en/latest/options/extensions/Content-MathML.html

But, as Bill K. states, most MathML content production is Presentation Markup. Which leaves us with potential accessibility issues. In the scope of WP/EPUB4, it may be good for the a11y taskforce to give advice on this issue.

In the interim, I hear a strong will to have MathML supported in Readium-2. As we intend to inject MathJax only when a resource is using it, the Publishing WG will have to provide a way to tag such resources in the WP manifest, as currently done in EPUB3.


Daniel Bennett, 8 Sept 2017, 1647h @whmccoy , Bill,

Thank you and others for helping me understand this issue. Based on your answer and some of the others, and having been at a contentious discussion at Balisage conference between a representative of a browser company and several MathML proponents, I can see that perhaps MathML folks can work on seeing if the standard can change in a way to better use HTML/CSS to provide a more compelling standard. The presentation vs representation separation in HTML has taken over 20 years to get to where we have embedded fonts, CSS Grid, CSS Paged Media and Rich Snippets/Microdata/RDFa. My feeling is that perhaps MathML folks should address this to get more buy in from the browser folks, not just pointing out the importance of math/STEM, etc.

One thing that I have looked for is bringing usable embedded semantic metadata in HTML, and have hope that some of the spreadsheet ideas could be used for HTML. For example Inline XBRL was an attempt to bring accounting semantics from XBRL (and XML language) to HTML. Not sure is there a possible overlap, but am hopeful for other folks to use some of the output of MathML.

rkwright commented 7 years ago

@BillKasdorf (Response to comment far above on 22 Sept) Yes, those are the original 3 questions, but despite my summary (which as you imply, leverages some of the answers in the thread), but I still think the questions are relevant and all these responses cast a lot of valuable light on the answers. But by no means do I feel we are there yet.

clapierre commented 7 years ago

Hello everyone,

Benetech and the DIAGRAM Center has a 'Math in EPUB' Task Force lead by Neil Soiffer, Sina Bahram, George Kerscher and Jason White. We started this TaskForce after a code sprint where a number of us worked on providing examples of MathML inside an EPUB and discovered some inconsistencies and needed to come up with a recommendation for publishers who have been asking for years how to put math in their EPUBs. We are close to providing three different examples of embedding MathML with various fallbacks when JavaScript is unavailable. We are testing both inline equations and standalone. Once we are happy with these EPUBs we will be sharing these with people to test on various reading systems to see how they behave when JavaScript is allowed or not, and if MathJax is present or not.

The basic premise here is an image with a description of the math will be the primary resource and mathML will be hidden off-screen. Now when JavaScript is allowed to run, our own JavaScript (which will be included in the EPUB) will remove on math images the alt-text descriptions and sets aria-hidden=true, and then unhides the associated mathML equation. We feel this will be the best compromise and for those distribution channels where JavaScript is forbidden to be included the publisher would just remove the JavaScript and the image with its description will be the primary way to access the math. At least this way there is some accessible math and the MathML is still in the EPUB just not accessible. We feel that JavaScript will become more widely adopted in EPUB reading systems especially as Publishing on the Web moves forward.

Charles LaPierre

avneeshsingh commented 7 years ago

Our strategy for the MathML samples is to go with the flow. We know that publlishers use MathML quite a lot so we intend to find out a way to make it usable at this point of time. Therefore the objective is to include these MathML techniques in the techniques of EPUB Accessibility Conformance and Discovery specification. The screen readers support for MathML has become better, so simpler equations can also be read without MathJax. All this encouraged us to trust MathML for time being.

The use of MathML in WP/PWp is not in the scope of this work, is to early to answer this question. For now the focus is EPUB 3.x and MathML.

iherman commented 6 years ago

Propose closing: It is not up to this WG to solve the MathML (or, more generally, math on the Web) issue. Per charter, we take what the OWP gives us (even if it is, currently, not satisfying...)

tcole3 commented 6 years ago

Agree with closing, but to clarify - It is not the WG's role to constrain the contents of WPs by favoring or not favoring the use of MathML (or other domain-specific tools chosen by publishers). Rather we take what the OWP gives us (as Ivan said). This said we anticipate that the EPUB3 Community Group, the Publishing Business Group, the Getting Math onto Web Pages and others likely will define and promulgate best practices that will help publishers needing to use MathML and alternatives to include mathematics in publications. Members of this WG should stay aware of these efforts and bring them to the WG's attention as appropriate.

ghost commented 6 years ago

I agree with close this issue for the spec wpub.

However I still leave some questions about if our WG should take a bit more care about MathML.

Can we say pagination is out of scope because generally pagination is a problem of whole web? (well, maybe :) ) I would like to see our WG can lead some work or TF to let MathML more easily to be implemented by UA. I totally agree with MathML is very complex enough for UA to adapt this spec. But on the other hand I don't think this can be resolved by just fundraising implementation such as https://mathml.igalia.com/.

iherman commented 6 years ago

Closing per https://www.w3.org/publishing/groups/publ-wg/Meetings/Minutes/2018/2018-03-12-minutes.html#resolution10