iandol / scrivomatic

A writing workflow using Scrivener's style system + Pandoc for output…
https://iandol.github.io/scrivomatic/
GNU General Public License v3.0
294 stars 30 forks source link

Compile to JATS #50

Open bokamm opened 1 month ago

bokamm commented 1 month ago

This is a question about compiling directly to JATS XML (see Journal Publishing Tag Library). JATS XML includes various metadata for article publishing and can potentially be used for displaying an article in a dynamic format (giving access to references and supplementary material via tabs). PKP OJS can use this, for example: https://demo.publicknowledgeproject.org/ojs3/demo/index.php/immersion/article/view/918/463

It seems, pandoc could convert to JATS but I have not figured out how to do this. There was a lua converter but that has been archived (and uses an old JATS version): https://github.com/mfenner/pandoc-jats

Could the lua filter etc. be integrated into scrivomatic? I super appreciate scrivomatic but do not have the knowledge to update and integrate pandoc-jats for this. If it is too complicated or not possible, that is fine. If it is just a matter of copying and pasting the metadata block and lua files somewhere, I can try myself but would still appreciate guidance.

(If I can donate to scrivomatic somehow, I will sure do!)

iandol commented 1 month ago

Hi @bokamm, it would be easy to add the lua converter (it is a pandoc writer plugin), but pandoc does offer several native outputs already:

jats_archiving ([JATS](https://jats.nlm.nih.gov/) XML, Archiving and Interchange Tag Set)
jats_articleauthoring ([JATS](https://jats.nlm.nih.gov/) XML, Article Authoring Tag Set)
jats_publishing ([JATS](https://jats.nlm.nih.gov/) XML, Journal Publishing Tag Set)
jats (alias for jats_archiving)

https://pandoc.org/MANUAL.html#option--to

The real problem is how to read the metadata. Scrivomatic uses my own filter to read academic metadata (author affiliations etc.) and I don't see how Pandoc will understand these, so we need to convert them to whatever JATs uses:

https://pandoc.org/jats.html

I don't yet understand if this is for read and/or write.

Here is how we can test:

1) make a simple document (save as jats.md):

---
title: A JATS Test
date: 2024-01-02
author:
  - surname: Doe
    given-names: Jane
    orcid: XXXXXXXXXXXXX
    affiliation: ioa
affiliation:
  - id: ioa
    organization: Inst. of Allopecia
    country: Indonesia
article:
  - doi: 10.234/23222.54
    pmid: 3244343333
    heading: testing
    funding: Dept. of Thin Air
journal:
  - publisher-id: JOHTEST
    title: Journal of Hair
abstract: |
    This is the abstract of the article. Blah blah blah.
tags: jats, pandoc, test
---

# Intro

Blah blah blah.

# Conclusion

Blah blah blah.

2) Convert with pandoc -s -t jats -o jats.xml jats.md and look at the xml:

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.2 20190208//EN"
                  "JATS-archivearticle1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML"
  xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
  <front>
    <journal-meta>
      <journal-id></journal-id>
      <journal-title-group>
      </journal-title-group>
      <issn></issn>
      <publisher>
        <publisher-name></publisher-name>
      </publisher>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>A JATS Test</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <contrib-id contrib-id-type="orcid">XXXXXXXXXXXXX</contrib-id>
          <name>
            <surname>Doe</surname>
            <given-names>Jane</given-names>
          </name>
          <xref ref-type="aff" rid="aff-ioa"/>
        </contrib>
        <aff id="aff-ioa">
          <institution-wrap>
            <institution>Inst. of Allopecia</institution>
          </institution-wrap>,
          <country>Indonesia</country>
        </aff>
      </contrib-group>
      <pub-date date-type="pub" publication-format="electronic" iso-8601-date="2024-01-02">
        <day>2</day>
        <month>1</month>
        <year>2024</year>
      </pub-date>
      <permissions>
      </permissions>
      <abstract>
        <p>This is the abstract of the article. Blah blah blah.</p>
      </abstract>
      <kwd-group kwd-group-type="author">
        <kwd>jats, pandoc, test</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="intro">
      <title>Intro</title>
      <p>Blah blah blah.</p>
    </sec>
    <sec id="conclusion">
      <title>Conclusion</title>
      <p>Blah blah blah.</p>
    </sec>
  </body>
  <back>
  </back>
</article>

It seems Pandoc can read the YAML metadata we can store in Scrivener, and output to XML -- looks promising!

The journal output is wrong, not sure why...

I am totally ignorant of what JATS is supposed to look like. If you can get this to work manually (running the pandoc command directly) it will be easy for me to adapt to scrivomatic. The problem is that my metadata is not compatible with JATS so we may need some preprocessing to convert the metadata I use into the one JATS needs...

Do you need this for existing projects or you are starting a new one?

bokamm commented 1 month ago

Wow, you are fast! Thank you!

I thought about using this for a future project but have a couple of previous ones, which I would love to convert to JATS XML if this works.

I will try this on an existing md-compile to see what happens. It's late over here, so please excuse if I will post the results tomorrow!

bokamm commented 1 month ago

OK, I used an existing project and compiled it for HTML. Then used the md-file and converted it to JATS XML. It produces an XML file that is read and displayed by OJS but, for example, does not get the authors right and the dependent files (images) are not picked up. Footnotes do not work and the references are not converted at all/are missing. I am attaching the HTML (which has references), the md and XML files as well as the scrivomatic.log. JATS-Test.zip I added the results to the publication to have a look (click the XML button to the right, under the image): https://jarps.net/journal/article/view/26/ (The HTML file here differs from the one just compiled because the published one had some more manual adjustments).

iandol commented 1 month ago

OK, so we know at present the scrivomatic pipeline will not produce working authors, and I can't get journal name to work so something wrong there.

For footnotes, citations and images, are they supposed to be supported? There are several jats variants, some for archiving, lets see if those support footnotes, citations and images:

---
title: A JATS Test
date: 2024-01-02
author:
  - surname: Doe
    given-names: Jane
    orcid: XXXXXXXXXXXXX
    affiliation: ioa
affiliation:
  - id: ioa
    organization: Inst. of Allopecia
    country: Indonesia
article:
  - doi: 10.234/23222.54
    pmid: 3244343333
    heading: testing
    funding: Dept. of Thin Air
journal:
  - publisher-id: JOHTEST
    title: Journal of Hair
abstract: |
    This is the abstract of the article. Blah blah blah.
tags: jats, pandoc, test
---

# Intro

Blah blah blah[^fn1].

![**Figure 1** — This is a fascinating caption.][image]

# Conclusion

Blah blah blah [@shipp2013].

# Bibliography

::: {#refs}

:::

[image]: placeholder.png

[^fn1]: Test footnote

Outputs:

pandoc -t html --citeproc --bibliography=/Users/ian/.local/share/pandoc/Core.json jats.md

<h1 id="intro">Intro</h1>
<p>Blah blah blah<a href="#fn1" class="footnote-ref" id="fnref1"
role="doc-noteref"><sup>1</sup></a>.</p>
<figure>
<img src="placeholder.png"
alt="Figure 1 — This is a fascinating caption." />
<figcaption aria-hidden="true"><strong>Figure 1</strong> — This is a
fascinating caption.</figcaption>
</figure>
<h1 id="conclusion">Conclusion</h1>
<p>Blah blah blah <span class="citation" data-cites="shipp2013">(Shipp,
Adams, and Friston 2013)</span>.</p>
<h1 id="bibliography">Bibliography</h1>
<div id="refs" class="references csl-bib-body hanging-indent"
data-entry-spacing="0" role="list">
<div id="ref-shipp2013" class="csl-entry" role="listitem">
Shipp, S, Rick A. Adams, and KJ Friston. 2013. <span>“Reflections on
Agranular Architecture: Predictive Coding in the Motor Cortex.”</span>
<em>Trends in Neurosciences</em> 36 (12): 706–16. <a
href="https://doi.org/10.1016/j.tins.2013.09.004">https://doi.org/10.1016/j.tins.2013.09.004</a>.
</div>
</div>
<section id="footnotes" class="footnotes footnotes-end-of-document"
role="doc-endnotes">
<hr />
<ol>
<li id="fn1"><p>Test footnote<a href="#fnref1" class="footnote-back"
role="doc-backlink">↩︎</a></p></li>
</ol>
</section>

Standard jats: pandoc -t jats --citeproc --bibliography=/Users/ian/.local/share/pandoc/Core.json jats.md

<sec id="intro">
  <title>Intro</title>
  <p>Blah blah blah<xref ref-type="fn" rid="fn1">1</xref>.</p>
  <fig>
    <caption><p><bold>Figure 1</bold> — This is a fascinating
    caption.</p></caption>
    <graphic mimetype="image" mime-subtype="png" xlink:href="placeholder.png" />
  </fig>
</sec>
<sec id="conclusion">
  <title>Conclusion</title>
  <p>Blah blah blah (Shipp, Adams, and Friston 2013).</p>
</sec>

Interestingly, the citation is rendered, the footnote is formatted but they don't link to any content. The Bibliography section is missing!? I tried the other two jats options with the same result...

In fact this is because they exist outside the body, if I add --standalone to make a whole doc:

pandoc -t jats --citeproc --bibliography=/Users/ian/.local/share/pandoc/Core.json jats.md --standalone

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.2 20190208//EN"
                  "JATS-archivearticle1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
<front>
<journal-meta>
<journal-id></journal-id>
<journal-title-group>
</journal-title-group>
<issn></issn>
<publisher>
<publisher-name></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<title-group>
<article-title>A JATS Test</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<contrib-id contrib-id-type="orcid">XXXXXXXXXXXXX</contrib-id>
<name>
<surname>Doe</surname>
<given-names>Jane</given-names>
</name>
<xref ref-type="aff" rid="aff-ioa"/>
</contrib>
<aff id="aff-ioa">
<institution-wrap>
<institution>Inst. of Allopecia</institution>
</institution-wrap>,
<country>Indonesia</country>
</aff>
</contrib-group>
<pub-date date-type="pub" publication-format="electronic" iso-8601-date="2024-01-02">
<day>2</day>
<month>1</month>
<year>2024</year>
</pub-date>
<permissions>
</permissions>
<abstract>
<p>This is the abstract of the article. Blah blah blah.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>jats, pandoc, test</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="intro">
  <title>Intro</title>
  <p>Blah blah blah<xref ref-type="fn" rid="fn1">1</xref>.</p>
  <fig>
    <caption><p><bold>Figure 1</bold> — This is a fascinating
    caption.</p></caption>
    <graphic mimetype="image" mime-subtype="png" xlink:href="placeholder.png" />
  </fig>
</sec>
<sec id="conclusion">
  <title>Conclusion</title>
  <p>Blah blah blah (Shipp, Adams, and Friston 2013).</p>
</sec>
</body>
<back>
<ref-list>
  <title>Bibliography</title>
  <ref id="ref-shipp2013">
    <mixed-citation>Shipp, S, Rick A. Adams, and KJ Friston. 2013.
    “Reflections on Agranular Architecture: Predictive Coding in the
    Motor Cortex.” <italic>Trends in Neurosciences</italic> 36 (12):
    706–16.
    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.tins.2013.09.004">https://doi.org/10.1016/j.tins.2013.09.004</ext-link>.</mixed-citation>
  </ref>
</ref-list>
<fn-group>
  <fn id="fn1">
    <label>1</label><p>Test footnote</p>
  </fn>
</fn-group>
</back>
</article>

The bibliography and footnotes are back matter. In fact in your JARP xml the footnotes are all there, the images are linked, only the references failed but that may be another problem? The point is footnotes/citation/images are working as far as getting into the XML...

I do not know how JATS is supposed to render the backmatter, or if the backmatter is supposed to be separate from the body. I don't use JATS and don't have time to explore, you'll need to work out what pandoc is supposed to output if it is doing something wrong, we can then post a bug report which they tend to fix quickly.

iandol commented 1 month ago

Refs at least are stored in back matter in the JATS examples:

https://jats.nlm.nih.gov/publishing/tag-library/1.3/chapter/samples.html

bokamm commented 1 month ago

Thank you for looking into this! I uploaded the jats.xml you created to see what happens and it renders author, affiliation etc. alright but does not display the references or footnotes. The references should appear as a separate tab to the right (just like the toc, figures, and info). Looking at a working demo XML from PKP (the makers of the journal software OJS), I do not see a difference (their bibliography is just named "References"; changing that on your sample had no effect). pkpadmin,+Journal+manager,+sample.xml.zip

I will keep looking into this (after grad student grading is over...). But thank you so much for your time and checking this!

iandol commented 1 month ago

Hi, some good news is that scrivomatic metadata and JATS are very similar, so by duplicating just a few keys we can get most output working in both with the same document (here I output to jats, pdf, html, and odt):

---
title: A JATS Test
date: 2024-01-02
author:
  - surname: Doe
    given-names: Jane
    name: Jane Doe
    email: jdoe@or.org
    correspondence: jdoe@or.org
    orcid: XXXXXXXXXXXXX
    affiliation: 1
    equal-contrib: true
    equal_contributor: true
  - surname: Doe
    given-names: Joanna
    name: Joanna Doe
    email: joanna@or.org
    orcid: YYYYYYYYYYYYYY
    affiliation: 2
    equal-contrib: true
    equal_contributor: true
affiliation:
  - id: 1
    organization: Institute of Or
    country: Indonesia
  - id: 2
    organization: Institute of Ar
    country: Indonesia
institute:
  - 1: Institute of Or
  - 2: Institute of Ar
article:
  doi: 10.234/23222.54
  pmid: 3244343333
  heading: Speculation
  categories: [testing, formulating]
  author-notes:
    corresp:
      - id: joanna
        email: joanna@or.org
    conflict: There is a big conflict
  funding-statement: Dept. of Thin Air
journal:
  publisher-id: pnas
  title: Proc Natl Acad Sci U S A
  issn: 0027-8424
  eissn: 0027-8424
  publisher-name: The National Academy of Sciences
  publisher-loc: USA
licence: Unlicence V1.0
abstract: |
    This is the **abstract** of the article. \
    Blah blah blah.
tags: [jats, pandoc, test]
pandocomatic_:
  use-template: [jats, pdf-refs, html-refs, odt-refs]
---

# Intro

Blah blah blah[^fn1].

![**Figure 1** — This is a fascinating caption.][image]

# Conclusion

Blah blah blah [@shipp2013].

# Bibliography

::: {#refs}

:::

[image]: placeholder.png

[^fn1]: Test footnote

Note we use surname and given-names for JATS and name for scrivomatic (no collision), there is both equal-contrib and equal_contributor and affiliation and institute, but we get this:

scrivomatic -v jats.md

HTML with all metadata converted properly:

image

And what looks like complete XML too (I also fixed the journal info etc.):

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.2 20190208//EN"
                  "JATS-archivearticle1.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" dtd-version="1.2" article-type="other">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">pnas</journal-id>
<journal-title-group>
<journal-title>Proc Natl Acad Sci U S A</journal-title>
</journal-title-group>
<issn publication-format="electronic">0027-8424</issn>
<publisher>
<publisher-name>The National Academy of Sciences</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.234/23222.54</article-id>
<article-id pub-id-type="pmid">3244343333</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Speculation</subject>
</subj-group>
<subj-group subj-group-type="categories">
<subject>testing</subject>
<subject>formulating</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A JATS Test</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" equal-contrib="yes">
<contrib-id contrib-id-type="orcid">XXXXXXXXXXXXX</contrib-id>
<name>
<surname>Doe</surname>
<given-names>Jane</given-names>
</name>
<email>jdoe@or.org</email>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib contrib-type="author" equal-contrib="yes">
<contrib-id contrib-id-type="orcid">YYYYYYYYYYYYYY</contrib-id>
<name>
<surname>Doe</surname>
<given-names>Joanna</given-names>
</name>
<email>joanna@or.org</email>
<xref ref-type="aff" rid="aff-2"/>
</contrib>
<aff id="aff-1">
<institution-wrap>
<institution>Institute of Or</institution>
</institution-wrap>,
<country>Indonesia</country>
</aff>
<aff id="aff-2">
<institution-wrap>
<institution>Institute of Ar</institution>
</institution-wrap>,
<country>Indonesia</country>
</aff>
</contrib-group>
<author-notes>
<corresp id="cor-joanna">* E-mail: <email>joanna@or.org</email></corresp>
<fn fn-type="conflict"><p>There is a big conflict</p></fn>
</author-notes>
<pub-date date-type="pub" publication-format="electronic" iso-8601-date="2024-01-02">
<day>2</day>
<month>1</month>
<year>2024</year>
</pub-date>
<permissions>
</permissions>
<abstract>
<p>This is the <bold>abstract</bold> of the article.
Blah blah blah.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>jats</kwd>
<kwd>pandoc</kwd>
<kwd>test</kwd>
</kwd-group>
<funding-group>
<funding-statement>Dept. of Thin Air</funding-statement>
</funding-group>
</article-meta>
</front>
<body>
<sec id="intro">
  <title>Intro</title>
  <p>Blah blah blah<xref ref-type="fn" rid="fn1">1</xref>.</p>
  <fig>
    <caption><p><bold>Figure 1</bold> — This is a fascinating
    caption.</p></caption>
    <graphic mimetype="image" mime-subtype="png" xlink:href="placeholder.png" />
  </fig>
</sec>
<sec id="conclusion">
  <title>Conclusion</title>
  <p>Blah blah blah
  (<xref alt="Shipp et al., 2013" rid="ref-shipp2013" ref-type="bibr">Shipp
  <italic>et al.</italic>, 2013</xref>).</p>
</sec>
</body>
<back>
<ref-list>
  <title>Bibliography</title>
  <ref id="ref-shipp2013">
    <mixed-citation><italic>Shipp, S., Adams, R. A., &amp; Friston,
    K.</italic> (2013) Reflections on agranular architecture: Predictive
    coding in the motor cortex. <italic>Trends in
    Neurosciences</italic>, <italic>36</italic>(12), 706–716.
    <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.1016/j.tins.2013.09.004">https://doi.org/10.1016/j.tins.2013.09.004</ext-link></mixed-citation>
  </ref>
</ref-list>
<fn-group>
  <fn id="fn1">
    <label>1</label><p>Test footnote</p>
  </fn>
</fn-group>
</back>
</article>

The jats ends up being called jats.jats, this can be fixed easily. You still need to figure out what needs to be changed for parsing at your journal...

iandol commented 1 month ago

For scrivomatic to use jats you need to add a JATS recipe to pandocomatic.yaml, this is mine for example:

https://github.com/iandol/dotpandoc/blob/master/pandocomatic.yaml#L67

bokamm commented 1 month ago

Wonderful!! Thank you so much. I tried it with an existing article and it worked fine. It seems the reason figures are not displayed is a code issue in the plugin used by OJS: https://github.com/asmecher/lensGalley/pull/69 I tried to adjust the code myself (as the pull request has not been implemented) but this results in an error. So, the JATS files seem to work, it is the OJS platform/plugins that cause issues (we are on the latest release and it seems many plugins are still not updated...). Anyway, thank you so much! Can I offer a coffee or something?

iandol commented 1 month ago

Hi, glad you got it working. I may make a post on the forums outline the details to get JATS working. I'll make some tweaks to scrivimatic docs to mention this also later...

Coffee welcome, if these work for you: paypal.me/iandol ko-fi.com/iandol.

bokamm commented 1 month ago

Thank you again so much! Coffee went out!

bokamm commented 2 weeks ago

Hallo again! I just wanted to report a follow-up because I got the references to be displayed (or at least found out what the issue was. I used as CMoS-style.csl for the references and they are rendered like below in the jats XML: `

<mixed-citation>Koljonen, Johanna, Jaakko Stenros, Anne Serup Grove,  
Aina D. Skjørnsfjell, and Elin Nilsen, eds. 2019. <italic>Larp  
Design: Creating Role-Play Experiences</italic>. Copenhagen:  
Landsforeningen Bifrost.</mixed-citation>  

`

However, jats XML needs references to be listed with a lot of sub-items, like below: `

<element-citation publication-type="book">  

  <person-group person-group-type="author">  

    <name>  

      <surname>Koljonen</surname>  

      <given-names>Johanna</given-names>  

    </name>  

    <name>  

      <surname>Stenros</surname>  

      <given-names>Jaakko</given-names>  

    </name>  

    <name>  

      <surname>Grove</surname>  

      <given-names>Anne Serup</given-names>  

    </name>  

      <name><surname>Skjørnsfjell</surname>  

      <given-names>Aina D.</given-names>  

    </name>  

    <name>  

      <surname>Nilsen</surname>  

      <given-names>Elin</given-names>  

    </name>  

</person-group>  

<year>2019</year>  

<source>Larp design: creating role-play experiences</source>  

<publisher-name>Landsforeningen Bifrost</publisher-name>  

<publisher-loc>Copenhagen</publisher-loc>  

</element-citation>  

`

I found this very old project which proved a jats-appropriate csl: https://github.com/mfenner/pandoc-jats However, it adds another nested label for the citations and reference list: <xref alt="1" rid="ref-koljonen.etal2019" ref-type="bibr">1</xref>

`

<ref id="1"><label>1</label> <element-citation publication-type="book"><person-group person-group-type="author"><name><surname>Koljonen</surname><given-names>Johanna</given-names></name></name><name><name><surname>Stenros</surname><given-names>Jaakko</given-names></name></name><name><name><surname>Grove</surname><given-names>Anne Serup</given-names></name></name><name><name><surname>Skjørnsfjell</surname><given-names>Aina D.</given-names></name></name><name><name><surname>Nilsen</surname><given-names>Elin</given-names></name></person-group><article-title>Larp design: creating role-play experiences</article-title><publisher-name>Landsforeningen Bifrost</publisher-name><publisher-loc>Copenhagen</publisher-loc></element-citation> </ref>

`

I cleaned the code (search & replacing the < etc. with correct HTML tags for < > and ", removing a redundant pair and got it to work, but if an article has a long reference list, this is too much work. Especially, if I also need to adjust each in-line citation and remove all the nested ref tags.

Again, this is not your concern but just wanted to keep you informed. Footnotes and images still don't work but the latter is a known OJS problem.