plazi / ggxml2taxpub

Conversion of GoldenGATE XML to JATS/TaxPub at treatment level
0 stars 1 forks source link

missing doi in <uri content-type="publication-doi"/> #35

Open myrmoteras opened 2 years ago

myrmoteras commented 2 years ago

@gsautter in this treatment taxPubL1 the value of the DOI is missing: https://tb.plazi.org/GgServer/taxPubL1/025FB45BD105A6A2AF03738DC48E7663

image

this is included in https://tb.plazi.org/GgServer/taxPubL1/025FB45BD105A6A2AF03738DC48E7663 image

gsautter commented 2 years ago

Actually, this is one for @tcatapano to solve, as it concerns the XSLT that does the transformation ... all I ever do is load that XSLT in TreatmentBank and pass our internal XML through it.

Also, there might well be cases of articles that simply don't have a DOI ... specifically XML uploads that come in without a DOI and are (naturally) lacking a PDF that could go to Zenodo to create a DOI ... the DOI not being there is something consumers need to be able to handle, really.

tcatapano commented 2 years ago

In this case there is no publication DOI in the source. Currently, the xslt gets the value from //document/@docSource, but that attribute is missing in the source XML: see https://tb.plazi.org/GgServer/xml/025FB45BD105A6A2AF03738DC48E7663

This is from Order Out of Chaos, which does not have a publication DOI.

Agree with @gsautter that consumers should not expect that a publication DOI will always be present.

gsautter commented 2 years ago

In this case there is no publication DOI in the source. Currently, the xslt gets the value from //document/@docSource, but that attribute is missing in the source XML: see https://tb.plazi.org/GgServer/xml/025FB45BD105A6A2AF03738DC48E7663

Could you go for //document/@ID-DOI instead? That's far more reliable, docSource is about as old as it gets, originally introduced for holding the URL some source HTML was downloaded from ...

gsautter commented 2 years ago

In this case there is not publication DOI in the source. Currently, the xslt gets the value from //document/@docSource, but that attribute is missing in the source XML: see https://tb.plazi.org/GgServer/xml/025FB45BD105A6A2AF03738DC48E7663

Could you go for //document/@ID-DOI instead? That's far more reliable, docSource is about as old as it gets, originally introduced for holding the URL some source HTML was downloaded from ...

That's for the source article DOI, by the way ... the treatment DOI, if available, is in /treatment/@ID-DOI.

tcatapano commented 2 years ago

See #34 for publication DOI, from mods:identifier, still pending. The treatment/zenodo DOI is in fact currently being pulled from @ID-DOI:

https://github.com/plazi/ggxml2taxpub-treatments/blob/6e43d073b69a3baaaa2f83d37854e284888a52a1/xslt/gg2tp_l1.xsl#L43