plazi / TaxPub

TaxPub Extension of the Journal Publishing Tag Set NISO JATS Version 1.1 (ANSI/NISO Z39.96-2015)
MIT License
12 stars 5 forks source link

validation fails due to missing file #53

Closed tcatapano closed 1 year ago

tcatapano commented 2 years ago

both @teodorgeorgiev and @gsautter report that validation is failing do to missing file

from @teodorgeorgiev: failed to load external entity "../nlm/JATS-mathmlsetup1.ent" on line 226

from @gsautter in https://github.com/plazi/ggxml2taxpub-treatments/issues/43#issuecomment-1103394684

... I encountered one error: JATS-mathmlsetup1.ent doesn't seem to exist in the repo folder you point me to, and none of its subfolders, either ... is this an oversight during prior-version cleanup, or a missing repo file?

tcatapano commented 2 years ago

Using oxygen, I am not getting this error when validating against a local copy of https://github.com/plazi/TaxPub/tree/v1.0-gamma using either the default engine (Xerces?) or xmllint (which does send warnings regarding duplicate models and the non-determistic nomenclature with its clumsy use of x

tcatapano commented 2 years ago

In case some of the base JATS files in the repo might have been lost, one could simply download the official JATS 1.1 files at: https://ftp.ncbi.nih.gov/pub/jats/publishing/1.1/JATS-Publishing-1-1-MathML3-DTD.zip

and then simply place the files:

tax-treatment-NS0-v1.dtd
taxpubcustom-classes-NS0-v1.ent
taxpubcustom-elements-NS0-v1.ent
taxpubcustom-mixes-NS0-v1.ent
taxpubcustom-models-NS0-v1.ent
taxpubcustom-modules-NS0-v1.ent

from https://github.com/plazi/TaxPub/tree/v1.0-gamma

and then validate against tax-treatment-NS0-v1.dtd in that context.

This is probably the preferred method anyway, as it insures that one is using the correct set of base JATS files being extended by TaxPub which is entirely done by the files listed above.

tcatapano commented 2 years ago

Doing this, again, I am not able to replicate the missing file error. Perhaps in other validation scenarios and environments it does not work. @teodorgeorgiev and @gsautter, how are you performing validation?

teodorgeorgiev commented 2 years ago

@tcatapano we are using the standard PHP DOMDocument::validate. It takes the DTD from the XML, which in our case we store locally: <!DOCTYPE article PUBLIC "-//TaxonX//DTD Taxonomic Treatment Publishing DTD v0 20100105//EN" "../../nlm/tax-treatment-NS0.dtd">

gsautter commented 2 years ago

@tcatapano also having a local copy of the DTD files from Pensoft available to the validator does fix the problem, and the server currently uses it this way ... Mainly wanted to make sure I don't validate against any older and stricter versions of now-relaxed definitions (as with tp:material-citation) and thus tried to validate against https://github.com/plazi/TaxPub/tree/v1.0-gamma alone, which led to the JATS-mathmlsetup1.ent error ... Could we make this repo self-contained, just to avoid similar scenarios with thrid-party Taxub users?

teodorgeorgiev commented 2 years ago

@tcatapano @gsautter

OK, now I see ... so far I was trying to validate it against tax-treatment-NS0.dtd and the result was: failed to load external entity "../nlm/JATS-mathmlsetup1.ent" on line 226

I did as you suggested above (downloaded the official JATS 1.1 and added all "-NS0-v1" files). I validate the XML against tax-treatment-NS0-v1.dtd and voilà ... I did not get this one anymore :)

However, now although I think my XML is valid I get the following error:

validity error : Content model of nomenclature is not determinist: (sec-meta? , label? , tp:taxon-name , x? , tp:taxon-authority? , x? , tp:taxon-status? , x? , tp:taxon-identifier* , xref* , x? , tp:nomenclature-citation-list* , x? , (tp:type-genus | tp:type-species)? , x? , tp:taxon-type-location? , x?)

Here is my test file test_taxpub.zip

tcatapano commented 2 years ago

@teodorgeorgiev: Yes. It's a known issue. See: https://github.com/plazi/TaxPub/issues/52. In the meantime, if at all possible try using the Xerces parser (https://xerces.apache.org/index.html) which I do not think will report this error. I'll prioritize a patch for this. Hope to get it out this weekend.

tcatapano commented 1 year ago

Closing