stencila / encoda

↔️ A format converter for Stencila documents
https://stencila.github.io/encoda/
Apache License 2.0
35 stars 9 forks source link

Equation ID #872

Open rgieseke opened 3 years ago

rgieseke commented 3 years ago

When going e.g. from JATS with MathML to JSON i noted that an id like

<inline-formula id="eq1"><mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" alttext="1\times 1" display="inline"><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#xD7;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:math></inline-formula> 

doesn't get an id in the resulting output.

I've changed the following

diff --git a/src/codecs/jats/index.ts b/src/codecs/jats/index.ts
index ef6a6e8c..40503729 100644
--- a/src/codecs/jats/index.ts
+++ b/src/codecs/jats/index.ts
@@ -2125,11 +2125,13 @@ function decodeMath(
     const image = decodeGraphic(graphic, inline)
     return inline ? image : [stencila.paragraph({ content: image })]
   }
-
+  const equationId = attr(formula, 'id') ?? ''
+  
   // Wrapper is needed to dump the entire math element
   const text = xml.dump(elem('wrapper', mathml))
   return [
     (inline ? stencila.mathFragment : stencila.mathBlock)({
+      id: equationId,
       mathLanguage: 'mathml',
       text,
     }),

Should i make a PR for this?

Another question, could the alttext attribute with the original TeX or screen reader information be a separate attribute? If yes, would that have to be added in the Math schema (https://schema.stenci.la/math) or meta field?

rgieseke commented 3 years ago

Figured out the last question, one way to get the alttext:

diff --git a/src/codecs/jats/index.ts b/src/codecs/jats/index.ts
index ef6a6e8c..d7f1572b 100644
--- a/src/codecs/jats/index.ts
+++ b/src/codecs/jats/index.ts
@@ -2125,11 +2125,14 @@ function decodeMath(
     const image = decodeGraphic(graphic, inline)
     return inline ? image : [stencila.paragraph({ content: image })]
   }
-
+  const equationId = attr(formula, 'id') ?? ''
+  const altText = attr(mathml, "alttext") ?? ''
   // Wrapper is needed to dump the entire math element
   const text = xml.dump(elem('wrapper', mathml))
   return [
     (inline ? stencila.mathFragment : stencila.mathBlock)({
+      id: equationId,
+      meta: {altText: altText},
       mathLanguage: 'mathml',
       text,
     }),
nokome commented 3 years ago

I like the idea of capturing the alttext and agree that meta.altText is the best place for this for now. However, in the longer term it should probably live in a specific property (meta is really just a temporary dumping ground for properties that we don't have specific properties for. Some options could be

Happy to consider alternatives. If you would like to progress this further a PR to stencila/schema would be appreciated.

stencila-ci commented 3 years ago

:tada: This issue has been resolved in version 0.111.0 :tada:

The release is available on:

Your semantic-release bot :package::rocket: