ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Use 'native' constructors and S4 slot accessors #121

Open gothub opened 4 years ago

gothub commented 4 years ago

These items were originally mentioned in issue https://github.com/ropensci/datapack/issues/119, raised by @cboettig, but deserve their own issue.

These changes should be applied to the entire package (and potentially rdataone package).

What is the implementation of 'native' constructors for S4 classes?

The Advanced R S4 chapter does describe S4 slot accessors, which should be easy to implement and not break the current API

amoeba commented 4 years ago

I honestly have never done this but I think your assessment of how is right. For every S4 class constructor, a bare R function is provided which does three things: (1) documentation, (2) validation/conversion/processing of inputs, and (3) creates the associated S4 object with new. Something like https://github.com/edzer/sp/blob/master/R/SpatialPointsDataFrame-methods.R seems like a good approach.

mbjones commented 4 years ago

Sounds good to provide a constructor, but let's do that in addition to the initalize() method so that new continues to work. We have an example of a constructor function in dataone::CNode, but in that case I didn't provide initialize when I should have. Basically, in datapack I used initialize, whereas in dataone I used constructors. We should be able to provide both approaches without duplicating the logic if the constructor functions call initialize themselves.

jeanetteclark commented 4 years ago

based on a discussion with Jasmine - there are a couple of features that we'd like to request for these functions, particularly the new_dataObject one. Ideally, the function could try to guess the formatId of the object based on the extension of the file (we have a set of mappings we use for this) if no formatId is specified. Additionally, it would be nice if the function could automatically generate a uuid identifier if none is specified. These features would turn the below code:

new("DataObject", id=paste0("urn:uuid:", UUIDgenerate()), format="text/csv", 
           filename= "../../../visitor/Flanner/sample_data.csv")

into:

new_dataObject(filename= "../../../visitor/Flanner/sample_data.csv")

I might also expect a return message saying something like:

Creating dataObject with formatId text/csv since no formatId was specified

I would not expect the function to be able to guess a formatId for xml files

gothub commented 4 years ago

@jeanetteclark the current DataObject initialization function assigns a urn:uuid:<random id identifier if one is not provided with the id argument, so that will be supported with the new incarnation of the DataObject constructor.

Usually datapack (and dataone) functions are quiet unless the caller has requested that info is printed. If you think this is important enough to add an argument to control message output, please comment. The format id can be checked after creation of the object with getFormatid().