ioos / glider-dac

The IOOS Glider DAC site/scripts/tools
http://gliders.ioos.us/providers/
10 stars 13 forks source link

End to End QC Implementation and Documentation #228

Closed kerfoot closed 1 year ago

kerfoot commented 1 year ago

Questions:

  1. Is the DAC applying any QARTOD QC tests on the real-time data sets?
  2. Where are the log files for the qc process located, if they are running?
  3. Where are the file containing the bounds located?
  4. What tests are being run and where are the results (names of variables) being stored?
  5. What are the criteria for determining whether DAC-applied QC tests will be run on incoming data sets?
  6. Where is the documentation on what exactly is being done at the DAC with respect to the application of QARTOD QC tests?
  7. How do we know if/when the QC process is performed on incoming data sets submitted by our users?
  8. How do the above affect our ability to create aggregated qc flags since it appears that we are not running any type of QC?
  9. If we are running tests on information contained (or not contained) in submitted data files, why has this data set not been QC’d?
benjwadams commented 1 year ago

1) Yes 2) Docker logs on the glider_qartod logs show this info. 3) https://github.com/ioos/glider-dac/blob/master/data/qc_config.yml 4) Currently gross range, flat line, rate of change, spike. Aggregate flag too if that is counted. 5) Do any geophysical variables have linked ancillary variables with standard names ending in quality_flag or status_flag? 6) Incomplete docs 7) Currently, there is a user.qc_run Linux xattr. However, it tells whether QC has been run by us or detected in the file. 8) TBD, but I don't think aggregate flags are run on a variable if QC vars are detected, the reason being that some institutions may have certain criteria in determining rollup/aggregate flag aside from taking the highest level of failure for each flag position within the array. 9) It looks like it has since been QCed, jobs are run on a queue so QC is not always run on time.

kerfoot commented 1 year ago

Regarding answers to ioos/ioosngdac#1, ioos/ioosngdac#4, ioos/ioosngdac#5, ioos/ioosngdac#8: There are no QC flags on this real-time dataset: https://gliders.ioos.us/erddap/tabledap/electa-20230523T1947.html and there are no user-supplied qc variables on the submitted NetCDFs in /data/submission/rutgers/electa-20230523T1947 and there are no geophysical variables that have an ancillary_variables containing quality_flag or status_flag. The dataset XML element appears to have added some _qc variables (i.e.: density_qc, temperature_qc, etc.), but the arrays are all _FillValues and there are no standard_names.

kerfoot commented 1 year ago

Added documentation and proposed process for finding files that need to be qc'd:

https://github.com/ioos/ioosngdac/wiki/Internal-DAC-Administration-Space#proposed-qc-process

Wrote a shell script to create the list of data provider submitted NetCDF files that need to be QC'd. The script can be found in:

/home/glider/qc/bin/build_deployment_qc_queue.sh

The script searches all active real-time data sets in:

/data/data/priv_erddap

and creates a list of the files that need DAC supplied QC applied to them. These files are located in:

/home/glider/qc/queue

The script is run as user glider

In testing, the script is currently monitoring 34 active real-time data sets. These data sets are processed (the file queue lists created) in under 60 seconds.

These files can be used as inputs to the ioos_qc processing pipeline. Depending on the performance of ioos_qc, this should allow us to significantly increase the qc application frequency.

Additional documentation is available here

kerfoot commented 1 year ago

Closed as OBE. Will be refiled as a new, more focused issue.