ncasuk / amf-check-writer

Library to write AMF compliance checks
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Run workflow for downloading google spreadsheets, creating YAML checks and CVs #8

Closed agstephens closed 3 years ago

agstephens commented 4 years ago

The initial workflow is as follows:

  1. Download the Data Product and related spreadsheets from Google Drive
  2. Convert the cached versions into TSV
  3. Regenerate the CVs
  4. Regenerate the checks
agstephens commented 4 years ago

Simplified workflow: (1) download, (2) make checks, (3) make CVs

Define a temporary output directory and create it:

export DATA_DIR=$PWD/check-data
mkdir -p $DATA_DIR

Set the version of the checks/vocabs to use:

VERSION=v2.0

Download the content of the Google spreadsheet vocabularies/rules into local files:

download-from-drive -v $VERSION --regenerate --secrets client-secret.json $DATA_DIR

Run a script to create the YAML representation of the checks:

create-yaml-checks -s $DATA_DIR -v $VERSION

Run a script to create the Controlled Vocabularies (in JSON and PYESSV formats):

create-cvs -s $DATA_DIR -v $VERSION

Run an example check (maybe having downloaded the training data):

# Set the PYESSV DIRECTORY TO USE:
export PYESSV_ARCHIVE_HOME=/root/.comp-check-cvs-cache/master/pyessv-archive-eg-cvs

amf-checker --yaml-dir check-data-2021-09-02/v2.0/checks ../NCAS-Data-Project-Training-Data/Data/ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc --version v2.0
agstephens commented 4 years ago

@gapintheclouds : This issue includes my own personal notes on how to download the data product spreadsheets, create the YAML checks and the controlled vocabularies. Let me know if you have any questions.

agstephens commented 3 years ago

The create yaml and CSV file scripts need to be made properly aware of dataset versions.

This should be more explicit so you don't have to set the output paths. Everything should be written and read to/from a standard location, such as

And maybe we only provide the output base directory to each script, and a separate version parameter, that would dictate <version> and put everything in the right place.