ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Some D1 pids not getting URL encoded in resourceMap #70

Closed gothub closed 7 years ago

gothub commented 7 years ago

When an RDF resource map is serialized from a DataPackage, any relationship that has a package member id as the subject or the object has those ids 'promoted' to a DataONE PIDs, as resolvable URLs, for example: urn:uuid:8839c67d-e292-46ef-adff-a646158fa023 is promoted to https://cn-dev-2.test.dataone.org/cn/v2/resolve/urn:uuid:8839c67d-e292-46ef-adff-a646158fa023

However, pids are not URL encoded for relationships that are added via insertDerivation, for example insertDerivation(x, source=x, derivation=y) which inserts a y prov:wasDerivedFrom x relationship into the DataPackage, where x and y are DataONE resolvable URLs, i.e. x=http://mn-dev-ucsb-1.test.dataone.org/metacat/d1/mn/v2/object/urn:uuid:da641293-ee21-4ffa-aac3-6a958a2add3e%22

It's not clear how to identify relationships in a DataPackage that are DataONE PIDs that are not in the package, that are not URL encoded, as DataONE pids can have many different formats. The PIDs could be URLencoded before calling insertRelationship() but that is a bit of a burden for the user.

gothub commented 7 years ago

In addition, the rdf/xml subprocessor requires that each pid in the prov:wasDerivedFrom relationship have a dcterms:identifier relationship, for example:

<rdf:Description rdf:about="https://cn-dev-2.test.dataone.org/cn/v2/resolve/urn%3Auuid%3A290cf274-aca6-4f8b-b980-0e0f8cbf9769">
    <dcterms:identifier rdf:datatype="http://www.w3.org/2001/XMLSchema#string">urn:uuid:290cf274-aca6-4f8b-b980-0e0f8cbf9769</dcterms:identifier>
  </rdf:Description>

This type of relationship is needed for both the subject and the object of the 'wasDerivedFrom` triple

gothub commented 7 years ago

This issue is resolved by requiring that pids entered as sources or derivations to insertDerivation are DataONE PIDS, without any preceeding resolve or object service URL. If this is the case, then it is trivial to determine which PIDs are for package members and which are for external PIDs (from other packages), thus making it easy to properly URL encode URLs in the resource map that these PIDs.