ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Serialized package relationships still contain improper blank node names #79

Closed gothub closed 7 years ago

gothub commented 7 years ago

Serializing package relationships to an RDF resource map still results in a document with invalid blank node identifiers:

  <rdf:Description rdf:nodeID="_:r1494617726r51613r1">
    <rdf:type rdf:resource="http://www.w3.org/ns/prov#Association"/>
  </rdf:Description>

This is from the DataONE resource map https://dev.nceas.ucsb.edu/knb/d1/mn/v1/object/urn:uuid:77aa0647-8c7b-4d08-8f85-4ea3dd84a813

mbjones commented 7 years ago

@gothub When this gets fixed, please update the ORE for the hydrocarbon data set on production. This parsing error just caused an error in our WholeTale app that was trying to parse the hydrocarbon ORE.

gothub commented 7 years ago

The resource map for the Hydrocarbon Database https://goa.nceas.ucsb.edu/goa/d1/mn/v2/object/urn:uuid:1d23e155-3ef5-47c6-9612-027c80855e8d is the updated version that contains 'valid' blank node identifiers and validates against the W3C RDF Validator

To fix the blank node id problem,DataPackage.R was updated to use blank node identifies without colon characters, which are invalid according to the W3C RDF Validator at https://www.w3.org/RDF/Validator.

This was fixed in commit https://www.w3.org/RDF/Validator/

Also, when the R redland package is used to parse a DataONE RDF/XML resource map, this library re-assigns blank node identifiers with '_:' characters. It makes sense that it would reassign the identifiers, as it has to ensure that they are unique, so this is not really a problem with the redland library. When datapack parses a resource map (via getTriples, parseRDF), it fixes these blank node ids to be valid, so that if the package is updated with triples from this resource map, the invalid blank node ids won't be used. This fix was made in commit f30ce206624b214e94403b7bdc411b0fc676d992.