ropensci / datapack

An R package to handle data packages
https://docs.ropensci.org/datapack
44 stars 9 forks source link

Package summary #113

Open gothub opened 4 years ago

gothub commented 4 years ago

@jeanetteclark please review

A way to easily review the contents of a package is needed, as was pointed out at this meeting with OPC and the NCEAS datateam: https://hpad.dataone.org/IoceU2y9SWaUCuCAjLavcg?edit

In particular, it should be easy to review the system metadata for all package members.

Add a function to generate a summary of system metadata for a package. This might be added to the current DataPackage::show() function or an additional method. Here is a sample of the show() function output:

d1c <- D1Client("STAGING", "urn:node:mnTestARCTIC")
resourceMapId <- "resource_map_urn:uuid:c6280bfa-cb0d-4d64-9c10-6cfc1bd0065f"
pkg <- getDataPackage(d1c, identifier=resourceMapId, lazyLoad=TRUE, limit="0MB", quiet=FALSE)
Members:

filename                  format      mediaType  size     identifier                        modified local 
Discon_Per_Zone.csv       text/csv    NA         911      urn:uuid:26de48...fb-f6c1fc541df5 n        n     
Aggregated_...classes.xml eml:....1.1 NA         4801     urn:uuid:3a669b...a9-df642e504f0c n        y     
Con_Per_Zone.csv          text/csv    NA         911      urn:uuid:62d4fb...5b-8e1ea794ed82 n        n     

Package identifier: resource_map_urn:uuid:c6280bfa-cb0d-4d64-9c10-6cfc1bd0065f
RightsHolder: http://orcid.org/0000-0002-5846-9296

Relationships:
                           subject           predicate                           object
6 Aggregated_and...ver_classes.xml      cito:documents              Discon_Per_Zone.csv
4 Aggregated_and...ver_classes.xml      cito:documents Aggregated_and...ver_classes.xml
5 Aggregated_and...ver_classes.xml      cito:documents                 Con_Per_Zone.csv
2 Aggregated_and...ver_classes.xml cito:isDocumentedBy Aggregated_and...ver_classes.xml
3                 Con_Per_Zone.csv cito:isDocumentedBy Aggregated_and...ver_classes.xml
1              Discon_Per_Zone.csv cito:isDocumentedBy Aggregated_and...ver_classes.xml
jeanetteclark commented 4 years ago

Yes I think this is a good summary of the issue. It would be nice if the return value was a list that is navigable/selectable.

In arcticdatautils the list is organized by formatType

So something like summary$metadata$sysmeta would enable you to examine the sysmeta and extract relevant parts of it quickly. Similarly, summary$data[[1]]$sysmeta for the first object with formatType == "DATA". In this use case, objects without a formatType set are put into something likesummary$unknown