Open jeanetteclark opened 1 year ago
Thanks @jeanetteclark , really appreciate the help here. I'm still a bit confused as the minimal reprex of the problem though. Is this error really coming from something emld
does or is the XML file in question considered invalid by xml2::xml_validate
?
ok never mind, reading more closely now. I see your point about
https://github.com/ropensci/emld/blob/master/data-raw/eml-2.2.0.json#L20
and your point about different flavors of the element which are obviously neglected under the data model there (which essentially ignores the namespace). XML is so much richer than JSON, it's hard to program against!
On a conceptual / philosophical level, is there a good reason for spatialSamplingUnits to use "coverage" to refer to a complexNode that appears to correspond to a node we would call "geographicCoverage" elsewhere? I know this is how it is in the schema and so we're stuck with it, but it feels a bit like a schema bug to me.
Anyway, we can probably hack around it in various ways, but probably none of them very elegant.
My intuition is that we should change:
https://github.com/ropensci/emld/blob/master/data-raw/eml-2.2.0.json#L127
to specify that the child element of spatialSamplingUnits is a geographicCoverage
element as far as emld
's json file is concerned. We would then need some additional code to monkey-patch around this issue when serializing / parsing from XML -- i.e. literally renaming any node spatialSamplingUnits/coverage
to spatialSamplingUnits/geographicCoverage
on parsing from XML and the reverse when serializing back to XML.
@mbjones / @jeanetteclark lemme know if that sounds reasonable or if I totally missed the boat here.
Sounds reasonable to me. And it sounds like the problem would be solved by your solution, despite it being a bit ugly. I don't expect the EML schema to change for something like this, as it is perfectly valid (and unambiguous) in XML, and there's a lot of history to these schemas in use. So, I support whatever patching is needed in emld to force the incompatible json-ld model to match the actual eml.xsd model for validation.
Seems like a reasonable solution to me as well
Thanks @mbjones , sounds like a plan!
@jeanetteclark any chance you want to take a stab at this? (I think it should just be an xpath query wrapped in after reading from xml or writing to xml. could be done after the xml -> list, but xpath is probably the easiest way to search for any occurrence of a spatialSamplingUnits/coverage
node.)
Yeah I'm definitely willing to take a stab - I might be a bit slow as I work it in with other duties. I agree that xpath is probably easiest (and most reliable). I'll do a little digging to see what other areas of the schema might be problematic in this way too (hopefully this part isn't too big of a can of worms). While looking around for this I noticed a lot of other tucked away corners of the schema the use coverage
element, and have no idea if this situation could arise elsewhere.
Over in EML we got a bug report that the EML package wasn't recognizing valid EML as being valid.
The problem stemmed from the element:
After digging into the
eml-2.x.x.json
file, which is the backbone of the serialization process withinas_eml_document
, it seems that there is only one flavor ofcoverage
included, of theeml-dataset:coverage
variety, which requires ageographicCoverage
prior to thegeographicDescription
element.In the above example, however, the coverage shown above is
eml-methods:coverage
, which does not require thegeographicCoverage
child, since it has ageographicCoverage
type.This is definitely a bug, since the document was indeed valid, though I'm unsure as to how to proceed with fixing it. Any input from @cboettig or @mbjones would be most welcome