ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Add ability to define provenance relationships between packages #66

Closed gothub closed 7 years ago

gothub commented 7 years ago

A user has requested that datapack support the ability to assert provenance relationships between DataPackages, e.g. DataPackage A contains source data and DataPackage B contains products derived from DataPackage A. Of course these relationships are useful after the packages have been uploaded to a data repository, so possibly a repository based tool for establishing these relationships makes more sense that an R based solution.

gothub commented 7 years ago

Provenance can be asserted between two packages (say package A and package B by using the insertDerivation() method and specifying an object from package A as a source and an object from package B as a derivation. In this example, package B is the current package represented by object dp:

dp <- new("DataPackage")
dp <- insertDerivation(dp, sources="1234", derivations="5678")

where PID 1234 is from package A and PID 5678 is from package B. This call to insertDerivation() will cause a prov:wasDerivedFrom relationship to be entered into the current package dp. If the program argument is also used, then a prov:used relationship will be added, which will link the program (and execution) from package B to the source from package A