ioos / glider-dac

The IOOS Glider DAC site/scripts/tools
http://gliders.ioos.us/providers/
10 stars 13 forks source link

Implement aggregate QC flags to meet NDBC GTS Ingest requirements #277

Open kbailey-noaa opened 1 year ago

kbailey-noaa commented 1 year ago

NDBC currently does not read any glider QC flags as part of the GDAC harvest of real-time T and S data that they deliver to the GTS.

In order for NDBC to read QC flags, the GDAC must implement the QARTOD “Aggregate/Rollup” flag variable, following the IOOS Metadata Profile requirements: https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#quality-controlqartod and https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#requirements-for-the-qartod-aggregaterollup-flag

NDBC will only examine contents of the aggregate_quality_flag variable for filtering purposes for GTS harvest. NDBC will not (and should not) read detailed QC flags.

Rules governing the ‘Aggregate/Rollup’ flag variable:

The value of the variable is the UNESCO/QARTOD v2 convention: 9 = Missing 2 = Not Eval 1 = Pass 3 = Suspect 4 = Fail The value of the rollup flag should be the worst result out of all of the individual tests. It’s called a “Summary Flag” in the QARTOD Data Flags manual (pg 3). Here’s how it’s done in the ioos_qc library with the corresponding test. The variable should have a standard name attribute of aggregate_quality_flag. See the QARTOD section for more information. NDBC should exclude any values that are QC fail (and missing) but include everything else (not eval, pass, suspect)

Please implement these aggregate flags at least for real-time T and S, as these are the only variables that NDBC is harvesting and delivering to the GTS.

NDBC POCs: Dawn Petraitis and Bill Smith IOOS DMAC POC: Micah Wengren

benjwadams commented 1 year ago

Variable already exists in underlying NetCDF dataset, called qartod_%(name)s_primary_flag: https://github.com/ioos/glider-dac/blob/main/glider_qc/glider_qc.py#L338-L366

Needs to be added to ERDDAP datasets.xml via modification of scripts/build_erddap_catalog.py script.

leilabbb commented 1 year ago

The variable qartod_%(name)s_primary_flag does exist in the ERDDAP datasets.xml so there is no need to modify the build_erddap_catalog.py script

Modifications need to be applied to the glider_qc.py : the flag_meaning and standard_name attributes need to be replaced to adhere to the IOOS metadata requirements.

kerfoot commented 1 year ago

I still do not understand what is going on with this. Here is the link to all data sets that have updated since September 1, 2023:

Data Sets with min_time set to 2023-09-01

49 data sets are returned. Now, if I leave the date as September 1, 2023 AND add that the data set must contain the qartod_temperature_primary_flag:

Data Sets with min_time = 2023-09-01 AND contains qartod_tempreature_primary_flag

The number of data sets returned goes from 49 to 1.

Question: why do the majority of real-time data sets not include the qartod* flags?

leilabbb commented 1 year ago

Update on progress made:

see pull request #289

leilabbb commented 1 year ago

Update: The qc tests were tested further and the expected results were verified. Additional conditional statements need to be added to the qc method to handle special cases of data arrays.

kerfoot commented 1 year ago

The details of this were discussed on the 2023-10-12 tech meeting. Many of the qc tests are based on the assumption that the time data array is monotonically increasing. This means:

  1. There are no invalid timestamps (i.e.: t == 0 or t that occurs before or after the deployment)
  2. There are no duplicate timestamps
  3. The timestamps are monotonically ascending

If these assumptions are not met, then many of the tests (flat line, spike, etc.) cannot provide an accurate qc assessment. There are many examples from previously submitted data sets in which this assumption is not met.

We discussed implementing a new test, run prior to all other qc tests, that checks for a monotonically ascending time array. If any of the conditions listed above are not met, there are a couple of options:

  1. Adding a global attribute that specifies whether this condition has been met, ie.: NC_GLOBAL:valid_times = "True" or: NC_GLOBAL:valid_times = "False"
  2. Removing the offending file(s) from the aggregation

If the value for the attribute in 1 is True, then we proceed with all subsequent qc tests. This option is much easier to implement from a processing perspective. The decision was made to evaluate these options further and then make a decision during the following tech call.

leilabbb commented 11 months ago

To add to the above discussion, there is also the case of geophysical variables associated with incorrect standard names that need to be addressed to explain why QARTOD is not run on these variables. Example: https://gliders.ioos.us/erddap/info/bass-20150827T1909/index.html

kbailey-noaa commented 9 months ago

Can someone please provide a status update for this issue? We (IOOS) have a meeting with NDBC reps next week, and this will probably come up.

leilabbb commented 9 months ago

277 should be marked as complete.

Creating a new issue to address challenges encountered in implementing the QC test on the data arrays. See #318

kbailey-noaa commented 9 months ago

@leilabbb Closing this issue means that the aggregate QC flags are implemented for T and S, following the IOOS Metadata Profile requirements, and that we are ready for NDBC to test. Is this the case?

leilabbb commented 9 months ago

Yes. The aggregate QC flag is created for T and S unless the variable does not exist in the file or its array is all NaN or Fill Values.

Example Variable "qartod_temperature_primary_flag" In file "ru39_20231203T182905Z_rt.nc" (

<class 'netCDF4._netCDF4.Variable'> int8 qartod_temperature_primary_flag(time) valid_min: 1 valid_max: 9 _FillValue: 9 units: 1 flag_meanings: PASS NOT_EVALUATED SUSPECT FAIL MISSING flag_values: [1 2 3 4 9] ioos_category: Quality qartod_test: 'qc_rollup' standard_name: aggregate_quality_flag long_name: QARTOD Primary Flag for sea_water_temperature qartod_package: https://github.com/ioos/ioos_qc/blob/main/ioos_qc/qartod.py
references: http://gliders.ioos.us/static/pdf/Manual-for-QC-of-Glider-Data_05_09_16.pdf qartod_config: {'gross_range_test': {'suspect_span': [0, 35], 'fail_span': [-2, 40]}, 'spike_test': {'suspect_threshold': 0.02396099641919136, 'fail_threshold': 0.04792199283838272}, 'rate_of_change_test': {'threshold': 0.03594149462878704}, 'flat_line_test': {'tolerance': 1, 'suspect_threshold': 3600, 'fail_threshold': 9000}}

The values of the flags: masked_array(data=[1, --, --, 1, --, --, 1, --, 1, --, --, --, 1, --, 1, --, --, 1, --, --, 1, --, 1, --, 1, --, --, 1, --, --, --, 1, --, --, 1, --, --, 1, --, 1, --, --, 1, --, --, --, --, 1, --, 1, --, --, 1, --, --, 1, --, 1, --, --, 1, --, --, --, 1, --, 1], mask=[False, True, True, False, True, True, False, True, False, True, True, True, False, True, False, True, True, False, True, True, False, True, False, True, False, True, True, False, True, True, True, False, True, True, False, True, True, False, True, False, True, True, False, True, True, True, True, False, True, False, True, True, False, True, True, False, True, False, True, True, False, True, True, True, False, True, False], fill_value=9, dtype=int8)

kbailey-noaa commented 8 months ago

@mwengren can you please review?

kbailey-noaa commented 8 months ago

@leilabbb can you provide a link to a dataset where this is implemented?

leilabbb commented 2 months ago

@leilabbb can you provide a link to a dataset where this is implemented?

This data set has the primary_flag implemented.

kbailey-noaa commented 2 months ago

Thanks. @mwengren Can you pls review this? Does the variable name matter, or does NDBC just look at the standard name?

I thought we'd expect to see a variable name of, for example, sea_water_temperature_qc_agg but instead I see qartod_temperature_primary_flag. But the standard name is aggregate_quality _flag...

leilabbb commented 2 months ago

I think this was decided early on and was already in the system before I started working on QC. There is this document that has the *_primary_flag variable, which I believe won't break the GTS workflow.

mwengren commented 2 months ago

The IOOS Metadata Profile rules are based on using CF ancillary variables (so the data variable includes the name of the aggregate QC variable in its ancillary_variables attribute), and the aggregate variable also has standard name aggregate_quality_flag.

At a quick glance, it looks like this is in line with those rules. That should make it easier for NDBC to read Glider files in the same way they do the IOOS Metadata Profile files, but they'd have to confirm that probably.

kbailey-noaa commented 2 months ago

Thanks all! @leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?

leilabbb commented 2 months ago

Thanks all! @leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?

email sent

sarinamann-noaa commented 1 month ago

@sarinamann-noaa will request feedback from Bill Smith and Dawn Petraitis- can schedule a meeting to get information on their needs