Closed fred-atherden closed 5 years ago
I have edited versions of the IJM and eLife XML currently on the demo (edited because they currently use a separate convention, which will be changed going forward).
Hindawi content can continue to be used as it currently is (3914828 and 7292974).
I also have (an edited) bioRxiv sample, and can provide more if needed
@GiancarloFusiello to provide details on whether that's enough and where the content should be stored.
@FAtherden-eLife Like we did for retrieving the article id, it would be good to have a series of strategies and a series of test cases that cover known/supported xml formatting. This is how we did this for article ids https://github.com/libero/jats-ingester/blob/master/tests/test_xml_jats.py#L47
So to summerise, I need a list of XPaths to retrieve the data and a list of minimal xml examples I can use to test these cases. Thanks.
@GiancarloFusiello, There's just one XPath which is:
//*:article-categories/*:subj-group[not(@subj-group-type="heading")]/*:subject[1]
Here is some sample content (let me know if you need more):
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Cancer Biology</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Cancer Biology</subject>
<subject>General Economics</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group>
<subject>General Economics</subject>
<subject>Cancer Biology</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Cancer Biology</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>General Economics</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Cancer Biology</subject>
</subj-group>
<subj-group>
<subject>General Economics</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
Output of #242 will determine what ids should be generated for each of these. I can update here if needed.
Adding one more which includes more than 2 subjects (we're expecting 0 to N)
<article>
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Cancer Biology</subject>
</subj-group>
<subj-group>
<subject>Data</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject>Housing</subject>
</subj-group>
<subj-group>
<subject>General Economics</subject>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
More complex test case:
<article xmlns:mml="http://www.w3.org/1998/Math/MathML">
<front>
<article-meta>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Research Article</subject>
</subj-group>
<subj-group subj-group-type="subjects">
<subject><italic>Cancer Biology</italic></subject>
</subj-group>
<subj-group>
<subject>Data</subject>
<subj-group subj-group-type="subjects">
<subject>Housing</subject>
<subj-group>
<subject>General Economics</subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="level-1">
<subject>Ecology</subject>
<subject>Genetics and Genomics</subject>
<subj-group subj-group-type="level-2">
<subject>Evolutionary Biology</subject>
<subj-group subj-group-type="level-3">
<subject>General Economics</subject>
<subject><bold>Plant Biology</bold></subject>
</subj-group>
</subj-group>
</subj-group>
<subj-group subj-group-type="any-value">
<subject><mml:math id="i1" display="inline"><mml:mover accent="true"><mml:mi>α</mml:mi><mml:mo>^</mml:mo></mml:mover></mml:math></subject>
<subj-group subj-group-type="nested-sub">
<subject><mml:math id="i2" display="inline"><mml:mover accent="true"><mml:mi>β</mml:mi><mml:mo>^</mml:mo></mml:mover></mml:math></subject>
</subj-group>
</subj-group>
</article-categories>
</article-meta>
</front>
</article>
Provide JATS XML which can be used as test data for the
jats-ingester
with respect toscholarly-content-detail
.