Closed castedo closed 2 years ago
There isn't currently a standardized structured metadata format that will work optimally with all formats pandoc supports. The JATS writer supports JATS-specific structured metadata, as you've illustrated. But should the JATS reader produce this too? That would be very useful if you're going to re-render as JATS. (Then again, converting JATS to JATS is not so useful.) But if you're going to be rendering some other format, then you'd prefer to have something every pandoc format can handle, which is what the JATS reader currently gives you.
I think @tarleb has done some thinking about standardizing structured metadata, e.g. in his scholarly markdown project, so he may want to comment.
(Then again, converting JATS to JATS is not so useful.) But if you're going to be rendering some other format, then you'd prefer to have something every pandoc format can handle
Great point that I very much agree with.
For reference, I will use this closed issue as a high-level level nexus for other more specific issues that relate to pandoc metadata representing JATS metadata.
"JATS" is ambiguous since there are so many dialects of JATS. I can suggest some names for dialects. I list them in rough order from least specific to most specific:
NISO JATS: https://jats.nlm.nih.gov/
JATS4R: https://jats4r.org/
"PMC JATS": the dialect used by the millions of JATS files of real journal articles stored in the PMC Open Access Subset
"epijats JATS": the dialect that https://gitlab.com/perm.pub/epijats can convert into meaningful HTML and PDF files by combining pandoc with a full XML parser
"pandoc JATS": the dialect generated by pandoc using the packaged default template
@kamoe, here's a summary of issues with pandoc attempting to represent JATS metadata.
There are issues where the pandoc reader incorrectly represent metadata in JATS:
This is not just PMC JATS but also JATS that pandoc generates and is documented on https://pandoc.org/jats.html
Then there's PMC & pandoc JATS metadata that isn't read at all and absent from pandoc metadata from the reader:
Last but not least, in addition to the above, there are more JATS elements documented on https://pandoc.org/jats.html and show up in PMC XML but do not appear pandoc metadata from the JATS reader:
article
(<article-meta>
JATS)journal
(<journal-meta>
JATS)tags
in pandoc YAML (kwd-group
in JATS)My solution to all these problems is the not use pandoc and instead use an XML parser. The fixes and enhancements that I would actually use are improvements/fixes to processing of not metadata, but rather marked-up text (e.g. #8847).
Thanks for this @castedo. I note all your comments and concerns, and will take a good look at this. I'm very interested from the perspective of the implications for a future BITS reader, so this is all very relevant. The more bugs JATS gets addressed, the less issues BITS inherits!
In using pandoc I've encountered issues that I'm not sure whether to consider inside or outside the scope of what pandoc should handle.
This issue/feature of pandoc metadata representing JATS metadata can probably be closed, but I wanted to share my usage scenario and double check what is outside of scope. To frame the scope, I suspect the following question is useful:
What is the pandoc metadata for JATS supposed to be? Is it:
Currently it seems the answer is primarily 1) and optionally 2), and not 3). I'd say pandoc currently does a poor job doing 3) which I hope is because that's out of scope.
Here's a concrete usage case that I'm affected by which illustrates some of the issues. In my YAML header I have the following metadata for pandoc:
which outputs the following JATS XML:
That JATS XML if converted back into YAML+markdown via pandoc becomes:
If pandoc metadata is supposed to be primarily 1) and secondarily 2) then this seems fine, and this issues can be closed. If not, then I can file some more issues. I am currently starting to use separate Python libraries to extract metadata from JATS XML.
Thank y'all for such a wonderful tool!
[1] https://en.wikipedia.org/wiki/Passive_data_structure