Open colinpmillar opened 4 years ago
A possible route:
Each repository has a file: exported_data.json (or something)
this file contains records for each file (data, text and images) that is to be made available to others.
only make files listed in this way would be accessible outside of the internal private git repository, via github or the web application.
finally only files listed that have the "public" or the general ICES data policy are pushed to github.
one format could be - just to get a feeling for what we could do. In this example there are two main sections
{
"exports": [
{
"access": "public",
"files": [
"data/catage.csv",
"data/datage.csv",
"data/ibts_index.csv",
"output/fatage.csv"
],
"folders": [
"report"
]
},
{
"access": "vms",
"files": [
"data/effort_map.csv"
]
}
],
"contracts": [
{
"format": "SAG",
"formatID": 126,
"files": [
"output/sag.xml"
]
},
{
"format": "SAG Catch Scenarios",
"formatID": 137,
"files": [
"output/catchoptions.xml"
]
}
]
}
Note, that we could semi-automate this using an r function, or tags like roxygen
#' @export data/catage.csv
write.taf(catcage, dir = "data")
or more directly (this example exports all csv files in currently the data folder)
write.taf(catcage, dir = "data")
write.taf(datage, dir = "data")
write.taf(ibts_index, dir = "data")
export.taf(dir("data", pattern = "*.csv", access="public")
Maybe a naive question - but is there a reason why all data does not pass through bootstrap and should require at least some metadata? If this was the case, the data product could default to the most restrictive license of the import data unless explicitly overriden (e.g. you might have high resolution data with personal details as input, and generate a product that is shareable, but as a safety mechanism, you need to change the license for the output before other services will pick it up?)
Hi - yes, all data will pass through bootstrap and therefore have a data policy attributed to it - so then it is very sensible to assume that all files produced come under the most restrictive licence unless stated otherwise. That could simplify things a fair bit. Thanks!
I hope it does simplify it - but probably raises some questions too about how to rank licenses and their restrictions :-)
Summary
In order to ensure that we do not publish data that falls under a restricted lisence, we need a method to document (add metadata) to data products created during a TAF assessment. Of specific relavence is a feild stating the data policy / lisnce for the data file.
An example is, a summary of VMS data records may still fall under the VMS data policy, and cannot be published and have access restricted.
Currently only data entering through the bootstrap procedure has metadata records.