ncasuk / amf-check-writer

Library to write AMF compliance checks
BSD 3-Clause "New" or "Revised" License
0 stars 4 forks source link

Migrate to checksit #80

Open agstephens opened 1 year ago

agstephens commented 1 year ago

Checksit - update plans for October 2022

# Download the installer
https://raw.githubusercontent.com/ncasuk/amf-check-writer/main/install-checker-suite.sh

# Run it (with or without conda, depending if you have it installed)
bash ./install-checker-suite.sh --no-conda

# This installs everything into your conda environment, and creates a setup file, e.g.:
source checks-work-dir/setup-checks-env.sh

# ...now you can run the amf-checker, e.g.
TEST_FILE_NAME=ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc
TEST_FILE_URL="https://github.com/cedadev/compliance-check-lib/blob/main/tests/example_data/nc_file_checks_data/${TEST_FILE_NAME}?raw=true"

wget -O $TEST_FILE_NAME $TEST_FILE_URL
amf-checker --version $CHECKS_VERSION $TEST_FILE_NAME

This should display some output like this:

$ amf-checker --version $CHECKS_VERSION $TEST_FILE_NAME
[INFO] Running compliance-checker with arguments:
        compliance-checker --yaml /miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_mean-winds_land.yml --test product_mean-winds_land_checks:v2.0 --format text ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc
Running Compliance Checker on the datasets from: ['ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc']
2022-10-07 14:17:46.051092 [INFO] :: PYESSV :: Loading vocabularies from /miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-pyessv-vocabs ... please wait

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report
                                 Version 5.0.2
                     Report generated 2022-10-07T14:17:46Z
                      product_mean-winds_land_checks:v2.0

--------------------------------------------------------------------------------
                               Corrective Actions
ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc has 35 potential issues

                                 High Priority
--------------------------------------------------------------------------------
Global attribute: Conventions
* Required 'Conventions' global attribute value does not match regex 'CF\-1\.6,\ NCAS\-AMF\-2\.0\.0'.

Global attribute: platform
* Required 'platform' global attribute value is invalid. Check the 'platform:data:platform_id ' vocabularies for the correct value. Value found: 'ral'

Global attribute: platform_altitude
* Required 'platform_altitude' global attribute value does not match regex '-?\d+(\.\d+)? m'.

Let's look at what is happening here

The full command-line is:

$ amf-checker --version v2.0 ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc

Which translates to use the IOOS compliance-checker as follows:

compliance-checker --yaml <conda_install>/site-packages/amf-checks/AMF_product_mean-winds_land.yml --test product_mean-winds_land_checks:v2.0 --format text ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc

Let's understand the details:

--test product_mean-winds_land_checks:v2.0
 - means that it should run a test suite with that identifier

--yaml <conda_install>/site-packages/amf-checks/AMF_product_mean-winds_land.yml 
 - tells the checker to parse the test suite in finds in this YAML file,
   which happens to be: "product_mean-winds_land_checks:v2.0"

--format text - writes output to the terminal

ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc - is the file to check

So let's look at the YAML file first:

$ cat /miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_mean-winds_land.yml

suite_name: product_mean-winds_land_checks:v2.0

description: Check 'product mean-winds land' in AMF files

checks:

- __INCLUDE__: AMF_file_info.yml

- __INCLUDE__: AMF_file_structure.yml

- __INCLUDE__: AMF_global_attrs.yml

- __INCLUDE__: AMF_product_common_dimension_land.yml

- __INCLUDE__: AMF_product_common_global-attributes_land.yml

- __INCLUDE__: AMF_product_common_variable_land.yml

- __INCLUDE__: AMF_product_mean-winds_variable.yml

Which clearly includes a number of specific checks. How did they get there?

Let's group them, group 1 is a generic set of checks:

- __INCLUDE__: AMF_file_info.yml
- __INCLUDE__: AMF_file_structure.yml
- __INCLUDE__: AMF_global_attrs.yml

These are generic, and are written directly by the amf-check-writer package, see:

https://github.com/ncasuk/amf-check-writer/blob/main/amf_check_writer/spreadsheet_handler.py#L134-L149

Now come the more specific checks:

- __INCLUDE__: AMF_product_common_dimension_land.yml
- __INCLUDE__: AMF_product_common_global-attributes_land.yml <-- NOTE: this is a duplicate
...NOTE...: some products will have product-specific global attributes that will be seen here.
- __INCLUDE__: AMF_product_common_variable_land.yml
...NOTE...: some products will have product-specific dimensions that will be seen here.
- __INCLUDE__: AMF_product_mean-winds_variable.yml

These are compiled from BB's data product descriptions under:

https://drive.google.com/drive/u/1/folders/1Ha-Wt1IXXoBekLdnrQpjwIpPm07E0A4e

These have been then converted into two types of file that our checks currently reference:

1. YAML checks

The YAML check files are:

/miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_common_dimension_land.yml
/miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_common_variable_land.yml
/miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_mean-winds_variable.yml

Let's see some example content:

$ head /miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-checks/AMF_product_common_variable_land.yml

suite_name: product_common_variable_land_checks:v2.0

description: Check 'product common variable land' in AMF files

checks:

- check_id: check_time_variable_attrs
  check_name: checklib.register.nc_file_checks_register.NCVariableMetadataCheck
  comments: Checks the variable attributes for 'time'
  parameters:
    pyessv_namespace: product_common_variable_land
    var_id: time
    vocabulary_ref: ncas:amf

- check_id: check_time_variable_type
  check_name: checklib.register.nc_file_checks_register.VariableTypeCheck
  comments: Checks the type of variable 'time'
  parameters:
    dtype: float64
    var_id: time
    vocabulary_ref: ncas:amf

This tells us that the checks need to use classes in the compliance-check-lib repo, e.g.:

https://github.com/cedadev/compliance-check-lib/blob/main/checklib/register/nc_file_checks_register.py#L435

2. Controlled vocabs

And it also says we need to use vocabularies referenced as:

    pyessv_namespace: product_common_variable_land
    var_id: time
    vocabulary_ref: ncas:amf

This is pointing to a data structure copied from BB's spreadsheets into:

$ head /miniconda/envs/amf-checks-env/lib/python3.9/site-packages/amf-pyessv-vocabs/ncas/amf/product-common-variable-land/time
{
    "_type": "term",
    "canonical_name": "time",
    "create_date": "2018-07-09 13:09:00+00:00",
    "data": {
        "axis": "T",
        "calendar": "standard",
        "dimension": "time",
        "long_name": "Time (seconds since 1970-01-01 00:00:00)",
        "standard_name": "time",
...

So, the checks and controlled vocabs (CVs) are constructed from the spreadsheets.

Note that there are some specific rules we will need to incorporate into checksit, e.g.:

3. Checks need to know deployment mode

Inside the netCDF files, there should be something like:

$ ncdump -h ncas-anemometer-1_ral_29001225_mean-winds_v0.1.nc | grep -i 
        :deployment_mode = "land" ;

This needs to be read by the amf-checker tool in order to work out which suite of checks must be run.

In the example case, it was "land" and so the check suites that were referenced were land-based, and were derived from the "...-land" tabs in the "_common" spreadsheet, see:

https://docs.google.com/spreadsheets/d/1xiLTn0-MbhkdQ7wwq3vNOA9UBPxK6STi2mj81BGtYN8/edit#gid=37445012

Recap on connections between google sheets and check code

A reminder on how it all fits together:

How can we start using checksit?

Here is a plan for migrating across to checksit: