Open gothub opened 7 years ago
The output should include transformed relationships to use RDF namespaces and filenames instead of PIDs, for example, this:
sev.1.file1 cito:isDocumentedBy metadata.xml
instead of this:
urn:uuid:5f8f72b2-40b1-4da9-ba5a-b3dccf0b526f http://purl.org/spar/cito/isDocumentedBy urn:uuid:5f8f72b2-40b1-4da9-ba5a-b3dccf0b526f
Added condense
param to getRelationships()
which will return a version of the package
relationships which uses namespace prefixes for known namespaces, and uses the filename for a DataObject instead of the identifier when possible. For a sample DataPackage, the full relationships look like this:
> getRelationships(dp)
subject predicate object subjectType
4 execution1 http://www.w3.org/ns/prov#used scidataId <NA>
1 scimetaId http://purl.org/spar/cito/documents urn:uuid:4305b0e7-eb75-4e90-a6c3-fe103feccfb5 <NA>
2 urn:uuid:4305b0e7-eb75-4e90-a6c3-fe103feccfb5 http://purl.org/spar/cito/isDocumentedBy scimetaId <NA>
3 urn:uuid:4305b0e7-eb75-4e90-a6c3-fe103feccfb5 http://www.w3.org/ns/prov#wasDerivedFrom scidataId <NA>
5 urn:uuid:4305b0e7-eb75-4e90-a6c3-fe103feccfb5 http://www.w3.org/ns/prov#wasGeneratedBy execution1 <NA>
6 urn:uuid:abcd http://www.w3.org/ns/prov#startedAt Wed Mar 18 06:26:44 PDT 2015 uri
objectType dataTypeURI
4 <NA> <NA>
1 <NA> <NA>
2 <NA> <NA>
3 <NA> <NA>
5 <NA> <NA>
6 literal http://www.w3.org/2001/XMLSchema#string
and the condensed
relationships would be:
subject predicate object
4 "execution1" "prov:used" "scidataId"
1 "scimetaId" "cito:documents" "file3509163e759c.csv"
2 "file3509163e759c.csv" "cito:isDocumentedBy" "scimetaId"
3 "file3509163e759c.csv" "prov:wasDerivedFrom" "scidataId"
5 "file3509163e759c.csv" "prov:wasGeneratedBy" "execution1"
6 "urn:uuid:abcd" "prov:startedAt" "Wed Mar 18 06:26:44 PDT 2015"
Added in commit 0cce119eb13a65b21241976149a94fef03ecee5a
While this is not a 'visualization' of the prov relationships, per se, it does make them easier to view and understand.
This simplest way to do this visualization would be to convert the relationships into a data set that can be read by igraph, network, or dot packages. There's a nice tutorial online (http://kateto.net/network-visualization).
I've made a first pass at adding the functionality to the getRelationships
function by adding a plot
argument, which defaults to FALSE. Please feel free to suggest changes or tweak as needed. I've added the igraph package as an import
, and changed the vignette to generate the prov graph (line 272 of datapack-overview.Rmd
).
See forked package
The vertex and edge labels sometimes overlap, but I'm not sure how to programmatically solve that issue. Any help would be appreciated.
hi @taddallas thanks for the contribution! The graph you generated looks good. One suggestion - have you considered putting this plotting code into a separate function called something like 'plotRelationships'? It does make perfect sense to put this in 'getRelationships' because the plot you return is a representation of the package relationships, but if it were in a separate function, there could be arguments to control plotting parameters, and you could have the option to plot the graph immediately, or return or write out the graph to a standard graphics format.
I've separated the plotting function, but have still included an argument in getRelationships for plotting. Feel free to remove this. Also feel free to edit the functionality and the documentation. I'm just learning the S4 object referencing (setMethod, signature, etc.) system, so there may be some mistakes in how I've set things up. See files in pull request #90
Thanks so much! I agree with @gothub that the plotting should be its own function, and keep getRelationships to only return its data without side effects. So lets please remove the plot
argument from getRelationships in favor of using the separate function.
Sounds good. I've removed the plotting functionality and argument from getRelationships
, and now plotRelationships
takes a data.package object, runs getRelationships
with condense=TRUE, and then visualizes the relationships using igraph.
A user has requested that we add the ability to create a graphic (i.e. DAG graph) of the provenance relationships in a DataPackage. This would be useful when a DataPackage is being constructed, to verify that the prov relationships are correct. Note that it is currently possible to view the relationships by viewing the data frame returned from
getRelationships()
, however, a graph would be easier to view.