ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Allow Setting File Location #112

Closed ThomasThelen closed 4 years ago

ThomasThelen commented 4 years ago

This pull request is meant to satisfy issue #109.

~This is a draft pull request because the code has been written and unit tested however, I'm still testing it against DataONE and rdataone. I'd like to get early feedback on the changes I've made here to see if they're in the right direction. I'll slash out this disclaimer when I remove the tag.~

Changes to DataObject

These changes can be found in the https://github.com/ropensci/datapack/commit/723b27ca2efa25fc62a9cf2232e83b95c8cd94d5 commit. I add a new slot, relativeFilePath, to the DataObject class and use it to hold the file path. Its default value is NA_character_ which looks to be pretty standard. This member variable can be set like the others via the slot accessor.

Note that this value only gets added to the resource map by the DataPackage class, because the resource map lives in that object. (See next section).

Changes to DataPackage

The DataPackage change consist of two parts: Add the new information from DataObject to the resource map and parse the resource map when exporting as a bag.

Adding Paths to the Resource Map

The first change, c50c797, followed suite to add the new term to the global constants. This is used when inserting the relationship.

The heart of the second change (3202097) is a small addition to DataPackage. When a user adds a new DataObject to the DataPackage, a check is done to see whether the relativeFilePath was never not set (aka set). If so, then it adds the provAtLocation relation to the resource map.

    if (!is.na(iObj@relativeFilePath)){
        insertRelationship(x, getIdentifier(iObj), iObj@relativeFilePath, provAtLocation)
    }

Placing Files in Correct Download Location

There isn't too much to this change other than the quick mental gymnastics below

    # Set the path to the file
    relFile <- paste(bagDir, "/data/", sep="")
    # If the user described the path of the file, use it
    if (!is.na(dataObj@relativeFilePath)) {
        relFile <- paste(relFile, dataObj@relativeFilePath, sep = "")
    } else if (!is.na(dataObj@filename)) {
        # Otherwise, if they specified a filename use that
        relFile <- paste(relFile, dataObj@filename, sep = "")
    } else {
        # If the filename wasn't specified, use the identifier
        relFile <- paste(relFile, getIdentifier(dataObj), sep="")
    }

I re-worked the way that the file name is set (relFile) by moving the logic to the top of the "Data file writing section". This occurs after the resource map is written.

I made the decision to prioritize the DataObject::relativeFilePath over the DataObject::filename. If the user set the relativeFilePath, we use that since it has the relative path and the filename. If the user didn't supply that, but gave the filename, that's used and the file ends up in the data/ directory. If neither are supplied, then the identifier is used as the filename and is also placed in the data/ directory.

The rest of the method just needed to be refactored because it was concatenating the bag path + relFile (which is I moved to the top), which only amounted to variable replacing.

Testing

You can test this a number of ways...

  1. Create a new DataObject with relativeFilePath. View the object and make sure that you see the record
  2. Add it to a DataPackage and export it. You should have your data in the correct folder(s)
  3. Create an EML document and upload it (use rdataone) along with the file from step 1. You should note that Metacat is fine with it.