ices-eg / wg_WGTAFGOV

Working group on TAF Governance
https://community.ices.dk/ExpertGroups/WGTAFGOV/SitePages/HomePage.aspx
0 stars 1 forks source link

Develop metadata for TAF data products are documented #4

Open colinpmillar opened 4 years ago

colinpmillar commented 4 years ago

Summary

In order to ensure that we do not publish data that falls under a restricted lisence, we need a method to document (add metadata) to data products created during a TAF assessment. Of specific relavence is a feild stating the data policy / lisnce for the data file.

An example is, a summary of VMS data records may still fall under the VMS data policy, and cannot be published and have access restricted.

Currently only data entering through the bootstrap procedure has metadata records.

colinpmillar commented 4 years ago

A possible route:

one format could be - just to get a feeling for what we could do. In this example there are two main sections

  1. exports - a list of files and/or folders and the data policy that governs them
  2. contracts - these are formats that the repository provides. In this example it gives a SAG format and a catch options table format (both of these have definitions:
{
  "exports": [
    {
      "access": "public",
      "files": [
        "data/catage.csv",
        "data/datage.csv",
        "data/ibts_index.csv",
        "output/fatage.csv"
      ],
      "folders": [
        "report"
      ]
    },
    {
      "access": "vms",
      "files": [
        "data/effort_map.csv"
      ]
    }
  ],
  "contracts": [
    {
      "format": "SAG",
      "formatID": 126,
      "files": [
        "output/sag.xml"
      ]
    },
    {
      "format": "SAG Catch Scenarios",
      "formatID": 137,
      "files": [
        "output/catchoptions.xml"
      ]
    }
  ]
}

Note, that we could semi-automate this using an r function, or tags like roxygen

#' @export data/catage.csv
write.taf(catcage, dir = "data")

or more directly (this example exports all csv files in currently the data folder)

write.taf(catcage, dir = "data")
write.taf(datage, dir = "data")
write.taf(ibts_index, dir = "data")
export.taf(dir("data", pattern = "*.csv", access="public")
jensr commented 4 years ago

Maybe a naive question - but is there a reason why all data does not pass through bootstrap and should require at least some metadata? If this was the case, the data product could default to the most restrictive license of the import data unless explicitly overriden (e.g. you might have high resolution data with personal details as input, and generate a product that is shareable, but as a safety mechanism, you need to change the license for the output before other services will pick it up?)

colinpmillar commented 4 years ago

Hi - yes, all data will pass through bootstrap and therefore have a data policy attributed to it - so then it is very sensible to assume that all files produced come under the most restrictive licence unless stated otherwise. That could simplify things a fair bit. Thanks!

jensr commented 4 years ago

I hope it does simplify it - but probably raises some questions too about how to rank licenses and their restrictions :-)