ropensci / RNeXML

Implementing semantically rich NeXML I/O in R
https://docs.ropensci.org/RNeXML
Other
13 stars 9 forks source link

Failure to handle nested meta elements #196

Closed hlapp closed 5 years ago

hlapp commented 5 years ago

Reading the following file (which successfully validates with the NeXML validator) results in an error:

nex = read.nexml("https://raw.githubusercontent.com/phenoscape/phenoscape-data/master/curation-files/teleost-incomplete-files/Dillman_Supermatrix_Files/Lucinda_Vari_2009.xml")
  Error in attrs[["href"]] : subscript out of bounds

I believe this is because it chokes on the following meta element:

<meta xsi:type="ResourceMeta" rel="dc:source">
  <meta xsi:type="LiteralMeta" property="dc:identifier">https://doi.org/10.1643/CI-08-076</meta>
  <meta xsi:type="LiteralMeta" property="dc:bibliographicCitation">Lucinda, P. H. F. and R. P. Vari. 2009. New Steindachnerina species (Teleostei: Characiformes: Curimatidae) from the Rio Tocantins basin. Copeia, 2009(1):142-147</meta>
  <meta xsi:type="LiteralMeta" property="dc:title">Lucinda and Vari (2009)</meta>
</meta> 

Specifically, the code in classes.R lines 114-123 doesn't seem to be prepared for nested meta elements, and expects (in line 117) the href attribute to be present, which explains the error if it isn't because the object is "here" in the form of a content element that consists itself of meta elements.

CC @tjv and @KuangyiXu, who reported the initial issue.

cboettig commented 5 years ago

@hlapp There's nothing wrong with nesting LiteralMeta elements, we have several examples of that already. This example is nesting a ResourceMeta element, which I didn't think was permitted? I though that a resource type had to be URI valued.

I guess there's an implicit blank node value there, the distinction in RDFa between Resource and Literal meta seems somewhat unnecessary.... is there a reason this example is using a ResourceMeta, <meta xsi:type="ResourceMeta" rel="dc:source">, and not <meta xsi:type="LiteralMeta" property="dc:source">? Semantically I think they correspond to the exact same RDF...

hlapp commented 5 years ago

Hmm. I would have expected that the value of a LiteralMeta is to be treated as a literal (string or number), not as a structured object, and that the value of a ResourceMeta is to be treated as an object, given either by a reference to a resource, or as a nested structured meta object.

It seems that the schema backs me up on this: nexml:LiteralMeta says quite clearly that the content is a string, and nexml:ResourceMeta says clearly that href is optional, and that it can have a substructure of zero or more elements instantiating Meta.

Which examples do we have that nest meta elements within LiteralMeta? These should fail to validate if the validator enforces that part of the schema.

Maybe the implementation was mistakenly put into the handling of LiteralMeta instead of ResourceMeta? However, I have to say I also don't see in the LiteralMeta code how that would deal with nested meta elements.

cboettig commented 5 years ago

sorry, yeah, I had that backwards.

We have nested examples, e.g. https://github.com/ropensci/RNeXML/blob/master/inst/examples/treebase-record.xml, but they are ResourceMeta not LiteralMeta; just as you say. But the nesting does always includes an href, which is interpreted as the about element I believe.

cboettig commented 5 years ago

So like you say, it looks like RNeXML is assuming that nested meta elements must have an href, ie. cannot be blank nodes. So I guess unless the spec has anything against blank nodes, we should allow them?

hlapp commented 5 years ago

The spec actually specifically designates these as blank nodes:

If this element contains meta elements as children, then the object of this annotation is a "blank node".