ioos / ioos-code-sprint

Information about IOOS Code Sprint activities.
https://ioos.github.io/ioos-code-sprint/
MIT License
8 stars 14 forks source link

[Project Proposal]: Various IOOS Compliance Checker topics #40

Open jcermauwedu opened 7 months ago

jcermauwedu commented 7 months ago

Project Description

At the IOOS DMAC, it was generally agreed that there could be work put into the IOOS Compliance Checker. Additional IOOS toolsets may also receive beneficial updates with related work.

General topics:

Standards

Test Suite

Solicitation of participation in creation of example datasets with application of the OG-1.0 data format. The published document as it stands.

The example datasets will also need to be assessed for interoperability issues with CF, ACDD and NCEI.

A personal goal for this project is to continue work on acoustic type datasets with focus on the OG-1 data format and resolve or create additional issues for the IOOS Compliance Checker.

As time permits, examine impacts on glider processing packages with utilization of the OG-1.0 data format.

Community Engagement

GOAL: Increase community involvement in this and other IOOS toolsets.

NOTE: These topics could also serve as templates to other IOOS toolsets.

Features

Expected Outcomes

Community Engagement

Code and Documentation

Standards

OG-1.0 Examples

Skills required

It would be useful to have working knowledge of python and knowledge of the netCDF4, xarray and pytest packages.

Expertise

Novice

Topic Lead(s)

@jcermauwedu

It would be great to have co-leaders to share experiences with related issues.

Relevant links

Discussion and Issues

https://github.com/OceanGlidersCommunity/OG-format-user-manual/discussions/165 https://github.com/OceanGlidersCommunity/OG-format-user-manual/discussions/92 https://github.com/OceanGlidersCommunity/OG-format-user-manual/pull/172

Software

https://docs.python.org/3/ https://docs.xarray.dev/en/stable/ (https://unidata.github.io/netcdf4-python/) https://docs.pytest.org/en/8.0.x/ https://github.com/ioos/compliance-checker https://github.com/pyoceans/pocean-core https://github.com/ERDDAP/erddap

Example datasets and templates

https://www.ncei.noaa.gov/netcdf-templates https://github.com/ERDDAP/erddapTest http://test.opendap.org/

callumrollo commented 7 months ago

I'd be interested in contributing to this alongside #34

benjwadams commented 7 months ago

I'm a maintainer on IOOS Compliance Checker and would be happy to help with this.

ChrisBarker-NOAA commented 7 months ago

Perhaps one focused sub-topic could be focussing on the NOS OFSs, which, as a rule are not CF compliant.

ChrisBarker-NOAA commented 7 months ago

UGRID has recently been added to CF, as of 1.11.

I don't think the compliance checker(s) have kept up. There are a couple out there for UGRID, but I'm not sure of the status:

https://github.com/pp-mo/ugrid-checks

https://github.com/ioos/cc-plugin-ugrid

jcermauwedu commented 7 months ago

Thank you for all the feedback. I will continue to iterate on this proposal as additional feedback rolls in.

ocefpaf commented 6 months ago

@jcermauwedu I'd ike to experiment with OG standards as a compliance-checker plugin to check its feasibility during the code sprint. If we can leverage the existing cc-glider-plugin it may be helpful, if not, maybe we can cross this idea out and move to the next one.

jcermauwedu commented 6 months ago

I think this should be fairly straightforward to clone the glider plugin to create a cc-og-plugin.

jcermauwedu commented 6 months ago

@callumrollo @ocefpaf As was mentioned today, there are three areas of interest: (1) OG plugin, (2) CF, (3) improving community engagement. We can bootstrap the OG plugin with one or more tests and prepare it for the eventual release of information later in June 2024. There is plenty to do.

MathewBiddle commented 6 months ago

Thank you for taking the time to propose this topic! From the Code Sprint topic survey, this has garnered a lot of interest.

Following the contributing guidelines on selecting a code sprint topic I have assigned this topic to @jcermauwedu . Unless indicated otherwise, the assignee will be responsible for identifying a plan for the code sprint topic, establishing a team, and taking the lead on executing said plan. The first action for the lead is to:

jcermauwedu commented 6 months ago

@MathewBiddle The code of conduct link on Contributing: Ground Rules gives a 404.

MathewBiddle commented 6 months ago

Webpage https://ioos.github.io/ioos-code-sprint/2024/topics/02-compliance-checker-topics.html

Thanks for the heads up on the Code of Conduct. We are discussing what an organization wide one should be in this issue https://github.com/ioos/.github/issues/10

jcermauwedu commented 6 months ago

I think this should be fairly straightforward to clone the glider plugin to create a cc-og-plugin.

I utilized some boilerplate framework from the glider and ugrid plugin to create the OG plugin. @ocefpaf : Would you create a IOOS new repo with the Apache 2 license? cc-plugin-og? I will copy the boilerplate code over to it. REF: https://github.com/uw-farlab/cc-plugin-og

The basic operation seems functional. It just needs to be populated with proper content.

$ compliance-checker -l
IOOS compliance checker available checker suites:
 - OG:1.0
 - UGRID:2.0
 - acdd:1.1
 - acdd:1.3
 - cf:1.6
...

$ compliance-checker -t OG -D
====
 OG 
====
- check_basic_requirements

  Check basic OG stated conventions.
   * Format follows the CF 1.8 convention.
   * Format follows the ACDD 1.3 convention.
   * Variables are identified in capital letters.
   * Attributes are identified in lower case.

$ pytest
=========================================================== test session starts ============================================================
platform linux -- Python 3.11.9, pytest-8.2.0, pluggy-1.5.0
rootdir: /home/portal/src/cc-plugin-og
configfile: pyproject.toml
plugins: flake8-1.1.1, requests-mock-1.12.1, time-machine-2.14.1
collected 1 item                                                                                                                           

tests/test_basicchecks.py .                                                                                                          [100%]

============================================================ 1 passed in 0.16s =============================================================
ocefpaf commented 6 months ago

@ocefpaf : Would you create a IOOS new repo with the Apache 2 license? cc-plugin-og? I will copy the boilerplate code over to it. REF: https://github.com/uw-farlab/cc-plugin-og

I don't have admin privileges to create repos but, while I do believe that we should move that to IOOS at some point, it is nice to keep it under your an account where you (we?) have more control. When the project is kind of mature we can move it to IOOS. What do you think @MathewBiddle?

MathewBiddle commented 6 months ago

I agree with @ocefpaf. Once ready, feel free to submit a "New IOOS Repository Request" using the issue form linked at https://github.com/ioos/governance/issues/new/choose

jcermauwedu commented 5 months ago

Perhaps one focused sub-topic could be focussing on the NOS OFSs, which, as a rule are not CF compliant.

  • Run them through the compliance checker(s)
  • And by hand
  • Document what ways they are not compliant, and what needs to be done to bring them into compliance.

@ChrisBarker-NOAA @dpsnowden Is there URL to source some of these datasets for checking? Where should the feedback go?

ChrisBarker-NOAA commented 5 months ago

@jcermauwedu: Yes please!

The OFSs are all served up via TDS servers and also on AWS.

The AWS ones are here:

https://noaa-nos-ofs-pds.s3.amazonaws.com/index.html

It would be nice to have a complete list, maybe a utility to download a set, or ...

Where might that go?

In the compliance checker repo?

Maybe a new repo specifically for OFS compliance?

jcermauwedu commented 5 months ago

Quite a hunting expedition to find an unstructured grid example. There is a ton of data out there. Finally located an example.

$ wget https://noaa-nos-ofs-pds.s3.amazonaws.com/sfbofs/netcdf/202405/nos.sfbofs.fields.n006.20240522.t03z.nc

$ ugrid-checker nos.sfbofs.fields.n006.20240522.t03z.nc 

UGRID conformance checks complete.

List of checker messages :
  *** FAIL R502 : Mesh data variable "u" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "u" has dimensions ('time', 'siglay', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "v" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "v" has dimensions ('time', 'siglay', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "tauc" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "tauc" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "temp" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "temp" has dimensions ('time', 'siglay', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "salinity" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "salinity" has dimensions ('time', 'siglay', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "short_wave" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "short_wave" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "net_heat_flux" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "net_heat_flux" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "uwind_speed" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "uwind_speed" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "vwind_speed" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "vwind_speed" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_nodes" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_nodes" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_cells" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_cells" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_nodes_prev_int" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_nodes_prev_int" has dimensions ('time', 'node'), of which 0 are mesh dimensions, instead of 1.
  *** FAIL R502 : Mesh data variable "wet_cells_prev_int" has mesh="fvcom_mesh", which is not a variable in the dataset.
  *** FAIL R509 : Mesh data variable "wet_cells_prev_int" has dimensions ('time', 'nele'), of which 0 are mesh dimensions, instead of 1.
  ... WARN A903 : dataset has Conventions="CF-1.0", which does not contain a UGRID convention statement of the form "UGRID-<major>.<minor>".

Total of 27 problems logged :
  26 Rxxx requirement failures
  1 Axxx advisory recommendation warnings

Done.

Error codes and conformance documentation for the ugrid-checks code: https://ugrid-conventions.github.io/ugrid-conventions/conformance/

REF: https://github.com/pp-mo/ugrid-checks

Compliance Checker UGRID:2.0 response:

$ compliance-checker -t UGRID:2.0 nos.sfbofs.fields.n006.20240522.t03z.nc 
Running Compliance Checker on the datasets from: ['nos.sfbofs.fields.n006.20240522.t03z.nc']

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-22T21:25:29Z                      
                                   UGRID:2.0                                    
                    https://github.com/ioos/cc-plugin-ugrid                     
--------------------------------------------------------------------------------
                               Corrective Actions                               
nos.sfbofs.fields.n006.20240522.t03z.nc has 1 potential issue

                               Highly Recommended                               
--------------------------------------------------------------------------------
Run UGRID checks if mesh variables are present in the data
* No mesh variables are detected in the data; all checks fail.
ChrisBarker-NOAA commented 5 months ago

and lots of problems with it ;-)

If you want some smaller examples, you can use the OFS Subsetter:

SFBOFS and NGOFS2 are ugrids (and the great lakes ones, I think)

Not a good UI, but it works.

jcermauwedu commented 5 months ago

It looks like for those, the mesh/grid is not contained within the netCDF file. They are external in the "OFS_Grid_Datum" directory? Which is why the packages are not detecting a mesh variable?

$ head -5 sfbofs.2dm 
MESH2D
MESHNAME SFBOFS              
E3T       1     93      1      2      1
E3T       2     93     92      1      1
E3T       3      3     93      2      1
jcermauwedu commented 5 months ago

There was an initial plan to automatically call the CF tests if the OG format was called on for testing. Why not just enable it on the command line:

$ compliance-checker -v -t og:1.0 -t cf:1.8 ~/src/upstream/data/sea076_20230906T0852_R.nc
Running Compliance Checker on the datasets from: ['/home/portal/src/upstream/data/sea076_20230906T0852_R.nc']
Using cached standard name table v49 from /home/portal/.local/share/compliance-checker/cf-standard-name-table-test-49.xml

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-23T01:16:17Z                      
                                     cf:1.8                                     
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html
--------------------------------------------------------------------------------
All tests passed!

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                          Version 5.1.2.dev30+gf543e4f                          
                     Report generated 2024-05-23T01:16:17Z                      
                                     og:1.0                                     
  https://oceangliderscommunity.github.io/OG-format-user-manual/OG_Format.html  
--------------------------------------------------------------------------------
                               Corrective Actions                               
sea076_20230906T0852_R.nc has 2 potential issues

                                   Mandatory                                    
--------------------------------------------------------------------------------
Check that all attribute names are lowercase.
* Global attribute Metadata_Conventions should be lowercase: metadata_conventions
* Variable CNDC attribute URI should be lowercase: uri
* Variable DOXY attribute URI should be lowercase: uri
* Variable PRES attribute URI should be lowercase: uri
* Variable PSAL attribute URI should be lowercase: uri
* Variable TEMP attribute URI should be lowercase: uri
* Variable DENSITY attribute URI should be lowercase: uri
* Variable TIME attribute URI should be lowercase: uri
* Variable TIME_GPS attribute URI should be lowercase: uri

Missing mandatory variables.
* Variable PLATFORM_SERIAL_NUMBER is missing

The work on CF-1.9 is not fully complete, but it is complete enough for use in testing the OG 1.0 requirements at the CF-1.9 and CF-1.10 level. The class just needs to be enabled in the compliance checker. It was mentioned that once the CF-1.9 work is completed, we can also enable it for CF-1.10 as there is not much difference between the two versions. UGRID will be included for CF-1.11.

The above test is now testing four distinct rulesets for Ocean Gliders 1.0:

ChrisBarker-NOAA commented 5 months ago

“It looks like for those, the mesh/grid is not contained within the netCDF file.”

Something is off - by “those” do you mean from the OFS subsetter? They have always been complete for me.

I’ll try to get you one.

jcermauwedu commented 5 months ago

Something else to trudge through Thursday. Instrument representation appears to be different for OG 1.0 vs CF. The IOOS checker checks each variable for an instrument attribute to attach it to an instrument or package with recording multiple variables (CTD). The OG 1.0 format does the reverse. There is a list of instruments mapped to the list of variables defined as such below. This has caused an issue for the CF checker. The use of the instrument is first defined in the IOOS Glider DAC netCDF 2.0 format specification under the dimensionless container variable types.

global attribute:

string :instrument = "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence
 sensor", "Oxygen optode 4831", "Unpumped CT sail CTD", "Seaglider M1 Glider data logger" ;

variables:

PARAMETER = "TEMP_CPU_CHLA", "FLUORESCENCE_CHLA", "DPHASE_DOXY", 
    "MOLAR_DOXY", "TPHASE_DOXY", "TEMP_DOXY", "OXYSAT_DOXY", "PRES", 
    "SIGMA_T", "CNDC", "TEMP", "PSAL", "LATITUDE_GPS", "LONGITUDE_GPS", 
    "GLIDER_ROLL", "LATITUDE", "GLIDER_PITCH", "LONGITUDE", "GLIDER_DEPTH" ;

 PARAMETER_SENSOR = 
    "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence sensor", 
    "WET Labs {Sea-Bird WETLabs} ECO Puck Triplet BBFL2-IRB scattering fluorescence sensor", 
    "Oxygen optode 4831", "Oxygen optode 4831", "Oxygen optode 4831", 
    "Oxygen optode 4831", "Oxygen optode 4831", "Unpumped CT sail CTD", 
    "Unpumped CT sail CTD", "Unpumped CT sail CTD", "Unpumped CT sail CTD", 
    "Unpumped CT sail CTD", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger", 
    "Seaglider M1 Glider data logger", "Seaglider M1 Glider data logger" ;