ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Add ability to remove/edit prov relationships from a package #99

Closed gothub closed 6 years ago

gothub commented 6 years ago

Add a method that can remove one or a set of provenance relationships. One use case for this would be when downloading a package from a member node, edit the relationships, then update the package back to the member node.

Possibly allow for easily allow deleting all prov relationships or all relationships matching a wildcard.

mbjones commented 6 years ago

@gothub I implemented this as DataPackage::removeRelationships(), which can remove all relationships, or remove selected ones that match a given subjectID or predicate. Here's an example of use. Please review.

    dp <- new("DataPackage")
    # Create a relationship
    dp <- insertRelationship(dp, "/Users/smith/scripts/genFields.R",
                             "https://knb.org/data_20030812.40.1",
                             "http://www.w3.org/ns/prov#used")
    # Create a relationshp with the subject as a blank node with an automatically assigned blank 
    # node id
    dp <- insertRelationship(dp, subjectID=NA_character_, objectIDs="thing6", 
                             predicate="http://myns.org/wasThing")
    # Create a relationshp with the subject as a blank node with a user assigned blank node id
    dp <- insertRelationship(dp, subjectID="urn:uuid:bc9e160e-ca21-47d5-871b-4a4820fe4451", 
                             objectIDs="thing7", predicate="http://myns.org/hadThing")
    # Create multiple relationships with the same subject, predicate, but different objects
    dp <- insertRelationship(dp, subjectID="https://myns.org/subject1", 
                             objectIDs=c("thing4", "thing5"), predicate="http://myns.org/hadThing")
    # Create multiple relationships with subject and object types specified
    dp <- insertRelationship(dp, subjectID="orcid.org/0000-0002-2192-403X", 
                             objectIDs="http://www.example.com/home", predicate="http://myns.org/hadHome",
                             subjectType="uri", objectType="literal")

    # Now check if deleting various relationships results in the right number of rows
    expect_that(nrow(getRelationships(dp)), equals(6))
    dp <- removeRelationships(dp, predicate='http://myns.org/wasThing')
    expect_that(nrow(getRelationships(dp)), equals(5))
    dp <- removeRelationships(dp, subjectID='orcid.org/0000-0002-2192-403X')
    expect_that(nrow(getRelationships(dp)), equals(4))
    dp <- removeRelationships(dp, subjectID='https://myns.org/subject1', predicate='http://myns.org/hadThing')
    expect_that(nrow(getRelationships(dp)), equals(2))
    dp <- removeRelationships(dp)
    expect_that(nrow(getRelationships(dp)), equals(0))
gothub commented 6 years ago

@mbjones looks good. I verified that relationships are removed for subjectId and predicate arguments specified.

mbjones commented 6 years ago

Thanks, then I think this can be closed for the next release. Can you assign it to the right milestone and include it in the next release?