ropensci / emld

:package: JSON-LD representation of EML
https://docs.ropensci.org/emld
Other
13 stars 6 forks source link

Don't depend on `schemaLocation` to find the schema file to validate with #52

Closed amoeba closed 4 years ago

amoeba commented 4 years ago

As discussed in #45, a better approach to finding the schema file to validate a document with is to look at the QName on the root and defined namespaces on the document to come up with the namespace on the root. The current approach hopes xsi:schemaLocation is set on the root and we go to there to find the schema. However, xsi:schemaLocation isn't a required element so we can't really trust it'll be there.

I'll rewrite the logic in eml_validate to find the schema as described above.

amoeba commented 4 years ago

I couldn't find a way with xml2 to find the root namespace nor can it tell us the QName of the root element so we only have a local name. The package does retain the root namespace when parsing and serializing but I just can't get at it with user code.

I instead wrote a regex-based approach to find the QName on root and match that with namespaces defined on the root to come up with a schema file. This supports EML documents and also supports sub-module documents like literature, text, etc, even stmml.

amoeba commented 4 years ago

Heya @cboettig, I PR'd this over on https://github.com/ropensci/emld/pull/53. I would not mind a review but I think I've done some good testing, at least on the example docs we ship with the package and a few hand-crafted ones.

See the tests for my two new helpers to see how they work.