Open kbailey-noaa opened 1 year ago
Variable already exists in underlying NetCDF dataset, called qartod_%(name)s_primary_flag
: https://github.com/ioos/glider-dac/blob/main/glider_qc/glider_qc.py#L338-L366
Needs to be added to ERDDAP datasets.xml via modification of scripts/build_erddap_catalog.py
script.
The variable qartod_%(name)s_primary_flag does exist in the ERDDAP datasets.xml so there is no need to modify the build_erddap_catalog.py script
Modifications need to be applied to the glider_qc.py : the flag_meaning and standard_name attributes need to be replaced to adhere to the IOOS metadata requirements.
I still do not understand what is going on with this. Here is the link to all data sets that have updated since September 1, 2023:
Data Sets with min_time set to 2023-09-01
49 data sets are returned. Now, if I leave the date as September 1, 2023 AND add that the data set must contain the qartod_temperature_primary_flag:
Data Sets with min_time = 2023-09-01 AND contains qartod_tempreature_primary_flag
The number of data sets returned goes from 49 to 1.
Question: why do the majority of real-time data sets not include the qartod* flags?
Update on progress made:
the glider_qc.py script is using the ioos-qc library to create QARTOD test results for geophysical / legacy variables [pressure, conductivity, temperature, salinity, density].
the QARTOD rollup qc variable with the name "qartod_[legacy variable]_primary_flag" and standard_name "aggregate_quality_flag" is created for every legacy variable existing in the netcdf glider file.
the aggregate qc flags uses the ioos-qc the 'aggregate' and 'qartod_compare' methods to set the resultant rollup qc array of all tests run. The rollup up array takes the worst flag of a data point with the worst flag being 4 = FAIL in the following order: 9 = Missing 2 = Not Eval 1 = Pass 3 = Suspect 4 = Fail
The qc tests implemented and working are:
'gross_range_test'
'qc_rollup'
The qc tests implemented and need more testing are: 'spike_test' 'rate_of_change_test'
The qc tests implemented and not working is: 'flat_line_test',
see pull request #289
Update: The qc tests were tested further and the expected results were verified. Additional conditional statements need to be added to the qc method to handle special cases of data arrays.
The details of this were discussed on the 2023-10-12 tech meeting. Many of the qc tests are based on the assumption that the time data array is monotonically increasing. This means:
If these assumptions are not met, then many of the tests (flat line, spike, etc.) cannot provide an accurate qc assessment. There are many examples from previously submitted data sets in which this assumption is not met.
We discussed implementing a new test, run prior to all other qc tests, that checks for a monotonically ascending time array. If any of the conditions listed above are not met, there are a couple of options:
NC_GLOBAL:valid_times = "True"
or:
NC_GLOBAL:valid_times = "False"
If the value for the attribute in 1 is True, then we proceed with all subsequent qc tests. This option is much easier to implement from a processing perspective. The decision was made to evaluate these options further and then make a decision during the following tech call.
To add to the above discussion, there is also the case of geophysical variables associated with incorrect standard names that need to be addressed to explain why QARTOD is not run on these variables. Example: https://gliders.ioos.us/erddap/info/bass-20150827T1909/index.html
Can someone please provide a status update for this issue? We (IOOS) have a meeting with NDBC reps next week, and this will probably come up.
Creating a new issue to address challenges encountered in implementing the QC test on the data arrays. See #318
@leilabbb Closing this issue means that the aggregate QC flags are implemented for T and S, following the IOOS Metadata Profile requirements, and that we are ready for NDBC to test. Is this the case?
Yes. The aggregate QC flag is created for T and S unless the variable does not exist in the file or its array is all NaN or Fill Values.
Example Variable "qartod_temperature_primary_flag" In file "ru39_20231203T182905Z_rt.nc" (
<class 'netCDF4._netCDF4.Variable'>
int8 qartod_temperature_primary_flag(time)
valid_min: 1
valid_max: 9
_FillValue: 9
units: 1
flag_meanings: PASS NOT_EVALUATED SUSPECT FAIL MISSING
flag_values: [1 2 3 4 9]
ioos_category: Quality
qartod_test: 'qc_rollup'
standard_name: aggregate_quality_flag
long_name: QARTOD Primary Flag for sea_water_temperature
qartod_package: https://github.com/ioos/ioos_qc/blob/main/ioos_qc/qartod.py
references: http://gliders.ioos.us/static/pdf/Manual-for-QC-of-Glider-Data_05_09_16.pdf
qartod_config: {'gross_range_test': {'suspect_span': [0, 35], 'fail_span': [-2, 40]}, 'spike_test': {'suspect_threshold':
0.02396099641919136, 'fail_threshold': 0.04792199283838272}, 'rate_of_change_test': {'threshold':
0.03594149462878704}, 'flat_line_test': {'tolerance': 1, 'suspect_threshold': 3600, 'fail_threshold':
9000}}
The values of the flags: masked_array(data=[1, --, --, 1, --, --, 1, --, 1, --, --, --, 1, --, 1, --, --, 1, --, --, 1, --, 1, --, 1, --, --, 1, --, --, --, 1, --, --, 1, --, --, 1, --, 1, --, --, 1, --, --, --, --, 1, --, 1, --, --, 1, --, --, 1, --, 1, --, --, 1, --, --, --, 1, --, 1], mask=[False, True, True, False, True, True, False, True, False, True, True, True, False, True, False, True, True, False, True, True, False, True, False, True, False, True, True, False, True, True, True, False, True, True, False, True, True, False, True, False, True, True, False, True, True, True, True, False, True, False, True, True, False, True, True, False, True, False, True, True, False, True, True, True, False, True, False], fill_value=9, dtype=int8)
@mwengren can you please review?
@leilabbb can you provide a link to a dataset where this is implemented?
@leilabbb can you provide a link to a dataset where this is implemented?
This data set has the primary_flag implemented.
Thanks. @mwengren Can you pls review this? Does the variable name matter, or does NDBC just look at the standard name?
I thought we'd expect to see a variable name of, for example, sea_water_temperature_qc_agg but instead I see qartod_temperature_primary_flag. But the standard name is aggregate_quality _flag...
I think this was decided early on and was already in the system before I started working on QC. There is this document that has the *_primary_flag variable, which I believe won't break the GTS workflow.
The IOOS Metadata Profile rules are based on using CF ancillary variables (so the data variable includes the name of the aggregate QC variable in its ancillary_variables
attribute), and the aggregate variable also has standard name aggregate_quality_flag
.
At a quick glance, it looks like this is in line with those rules. That should make it easier for NDBC to read Glider files in the same way they do the IOOS Metadata Profile files, but they'd have to confirm that probably.
Thanks all! @leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?
Thanks all! @leilabbb It sounds like we're ready to proceed with an NDBC test. Can you pls email Bill Smith and Dawn Petraitis (cc: GDAC team) this dataset with a request to test?
email sent
@sarinamann-noaa will request feedback from Bill Smith and Dawn Petraitis- can schedule a meeting to get information on their needs
NDBC currently does not read any glider QC flags as part of the GDAC harvest of real-time T and S data that they deliver to the GTS.
In order for NDBC to read QC flags, the GDAC must implement the QARTOD “Aggregate/Rollup” flag variable, following the IOOS Metadata Profile requirements: https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#quality-controlqartod and https://ioos.github.io/ioos-metadata/ioos-metadata-profile-v1-2.html#requirements-for-the-qartod-aggregaterollup-flag
NDBC will only examine contents of the aggregate_quality_flag variable for filtering purposes for GTS harvest. NDBC will not (and should not) read detailed QC flags.
Please implement these aggregate flags at least for real-time T and S, as these are the only variables that NDBC is harvesting and delivering to the GTS.
NDBC POCs: Dawn Petraitis and Bill Smith IOOS DMAC POC: Micah Wengren