stencila / encoda

↔️ A format converter for Stencila documents
https://stencila.github.io/encoda/
Apache License 2.0
35 stars 9 forks source link

Textual/TeX representations of Math Nodes #876

Open rgieseke opened 3 years ago

rgieseke commented 3 years ago

Following up from #872

I like the idea of capturing the alttext and agree that meta.altText is the best place for this for now. However, in the longer term it should probably live in a specific property (meta is really just a temporary dumping ground for properties that we don't have specific properties for. Some options could be

https://schema.org/speakable : although that seems to be intended for a URL or a 'content-locator' rather than the text itself https://schema.org/alternativeHeadline: although not really an alternative title or maybe creating a new alternativeText property which could also be used on CodeExpression, ImageObject etc

Happy to consider alternatives. If you would like to progress this further a PR to stencila/schema would be appreciated.

I think there are different representations to consider. In many workflows the MathML comes from TeX. This might be worth keeping around as the final rendering might happen with KaTeX/MathJax from the TeX (even though they might also use/create MathML).

LaTeXML includes alttext as an attribute with the original TeX like this:

<disp-formula id="S0.Ex1">
  <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="a=b^{2}" display="block">
    <m:mrow>
      <m:mi>a</m:mi>
      <m:mo>=</m:mo>
      <m:msup>
        <m:mi>b</m:mi>
        <m:mn>2</m:mn>
      </m:msup>
    </m:mrow>
  </m:math>
</disp-formula>

Pandoc creates Jats-XML and includes a separate tex-math element:

<disp-formula>
<alternatives>
<tex-math><![CDATA[a = b^2]]></tex-math>
<mml:math display="block" xmlns:mml="http://www.w3.org/1998/Math/MathML"><mml:mrow><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></disp-formula>

In the JATS tag browser alt-text is described like this:

Accessibility: The short can be used for special accessibility display or presentation on graphic-limited websites or devices, as an alternative to providing the full graphic. (For example, the element is typically read by screen readers, and may also be used to display a few words “behind” a figure or graphic for devices with limited graphics capacity.) Please reserve this tag for accessibility uses such as pronouncing screen readers. This element should not to be used as a replacement for , which is a visual element typically displayed alongside a figure, table, etc. The is not a visual element, unless the figure, caption, or other major element that holds the is not available or cannot be processed by the person or device-type being addressed. Since it is not visual, does not allow face markup inside it; a simplified textual alternative for a graphic object (including face markup) can be created using the element.

https://jats.nlm.nih.gov/publishing/tag-library/1.2/element/alt-text.html

Maybe it's a possible approach for Stencila to include both representations as in the Pandoc output? While TeX is probably actually quite readable for many people i guess alt-text is probably not fully the right place.

In general i think there is also a need for markup free representations of elements with math, e.g. in titles or abstracts which are used on the web where math rendering is not done.

nokome commented 3 years ago

Maybe it's a possible approach for Stencila to include both representations as in the Pandoc output?

Yes, this seems the most appropriate. And would not require any schema changes: in the jats codec. We are already encoding a <tex-math> element if the mathLanguage is tex.

https://github.com/stencila/encoda/blob/8dc58e335a345219a389df30114eaab397df19ce/src/codecs/jats/index.ts#L2146-L2171

So this issue would just require us to always encode both a <tex-math> and a <mml:math> (via conversion from the source to the other)