ropensci / emld

:package: JSON-LD representation of EML
https://docs.ropensci.org/emld
Other
13 stars 6 forks source link

Either `as_emld` or `as_xml` not handling docs with submodule at root #54

Closed amoeba closed 4 years ago

amoeba commented 4 years ago

Round-tripping EML (e.g., <eml:eml...) docs works correctly but doesn't for documents with elements from submodules at their root (e.g., <dataset:dataset...). I think the reason this doesn't break the round-tripping tests is that they don't check to this fine a level of detail (same QName on root before and after).

An example of the correct behavior, exhibited when round-tripping a document with eml:eml at the root:

inpath <- "inst/tests/eml-2.2.0/eml-sample.xml"
doc <- xml2::read_xml(inpath)
> doc
{xml_document}
<eml packageId="doi:10.xxxx/eml.1.1" system="knb" schemaLocation="https://eml.ecoinformatics.org/eml-2.2.0 xsd/eml.xsd" xmlns:eml="https://eml.ecoinformatics.org/eml-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:stmml="http://www.xml-cml.org/schema/stmml-1.1">

emld <- as_emld(doc)
as_xml(emld, outpath)
> readLines(outpath)[[2]]
[1] "<eml:eml xmlns:eml=\"https://eml.ecoinformatics.org/eml-2.2.0\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:stmml=\"http://www.xml-cml.org/schema/stmml-1.2\" packageId=\"doi:10.xxxx/eml.1.1\" xsi:schemaLocation=\"https://eml.ecoinformatics.org/eml-2.2.0 xsd/eml.xsd\" system=\"knb\">"

An example of the incorrect behavior with a dat:dataset element at the root:

outpath <- tempfile(fileext = ".xml")
inpath <- "inst/tests/eml-2.2.0/eml-dataset.xml"
doc <- xml2::read_xml(inpath)
doc
{xml_document}
<dataset system="KNB" schemaLocation="https://eml.ecoinformatics.org/dataset-2.2.0          eml-dataset.xsd" xmlns:ds="https://eml.ecoinformatics.org/dataset-2.2.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

emld <- emld::as_emld(doc)
emld::as_xml(emld, outpath)
readLines(outpath)[[2]]
[1] "<eml:eml xmlns:eml=\"https://eml.ecoinformatics.org/eml-2.2.0\" xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xmlns:stmml=\"http://www.xml-cml.org/schema/stmml-1.2\" xmlns:ds=\"https://eml.ecoinformatics.org/dataset-2.2.0\" xsi:schemaLocation=\"https://eml.ecoinformatics.org/dataset-2.2.0          eml-dataset.xsd\" system=\"KNB\">"

See how the root elements name changes? I think I what's going on and I think the patch for now is to bring some of my helpers from #53 over into as_xml.emld.

amoeba commented 4 years ago

Hey @cboettig it occurs to me that you might have intended this behavior. I see that calling as_xml like emld::as_xml(emld, outpath, "dat", "dataset") makes this do the correct thing. A change here would require the emld object keeping track of whether it's an EML doc or for a submodule which it seems not to right now.

cboettig commented 4 years ago

@amoeba right, the serializing routines were never written to support generating things in the submodule namespaces alone (dat), since I'm not clear that there's any use case for that and seems confusing to users. Or at least that was my thinking at the time, so I'm happy to close this unless anyone feels like we ought to be handling generation of submodules as root?

amoeba commented 4 years ago

Closing is fine. This sounds out of scope for emld.

amoeba commented 4 years ago

Closing as I haven't seen any comments and I agree with @cboettig now that we've talked it out.