stencila / encoda

↔️ A format converter for Stencila documents
https://stencila.github.io/encoda/
Apache License 2.0
35 stars 9 forks source link

JATS: Underline decoding #417

Open fred-atherden opened 4 years ago

fred-atherden commented 4 years ago

Underline formatting appears to be lost when decoding JATS.

Steps to reproduce

Visit here and note underline formatting in certain letters in ' In mouse embryonic fibroblasts (MEFs),'

In jats this is captured using <underline>, as:

<p> ... In <underline>m</underline>ouse <underline>e</underline>mbryonic <underline>f</underline>ibroblasts (MEFs), ... <p>

The command

encoda convert https://elifesciences.org/articles/50051 50051.jsonld

Produces a jsonld file where the respective section is

"). This effect was also observed in two additional cell types. In ",
"m",
"ouse ",
"e",
"mbryonic ",
"f",
"ibroblasts (MEFs), ... ",
nokome commented 4 years ago

Thanks @FAtherden-eLife. The main reason that this has not been implemented is that we do not yet have a Stencila schema node type to represent it. This in turn, is due to the fact that, like HTML5, we avoid having non-semantic, or purely styling nodes. For example, I don't think we will ever have a node type that represents text that is highlighted yellow.

However, the line between the two can be blurred, as is the case for Emphasis (f.k.a. italic), Strong emphasis (f.k.a bold), and Delete (f.k.a ~strikethrough~) (see our docs here). I think we should take a similar approach for underlined text and come up with a "semantic version" of it. I notice that HTML now calls the <u> element an Unarticulated Annotation. That's a bit of a mouthful, but to be consistent in following HTML for those other inline elements, we should probably use that.

Would appreciate you and @alex-ketch's thoughts.