ioos / ioos_qc

:ballot_box_with_check: :ocean: IOOS QARTOD and other Quality Control tests implemented in Python
https://ioos.github.io/ioos_qc/
Apache License 2.0
46 stars 27 forks source link

flat_line_test doesn't work when threshold window mostly contains missing values (priority: low) #68

Open lgarzio opened 2 years ago

lgarzio commented 2 years ago

The flat_line_test doesn't appear to work in datasets with lots of missing values. ioos_qc version 2.0.1

Example netcdf file here

Example configuration file: test_flatline.txt

import xarray as xr
from ioos_qc.config import Config
from ioos_qc.streams import XarrayStream
from ioos_qc.results import collect_results

f = 'maracoos_02_20210716T190208Z_dbd.nc'
config_file = 'test_flatline.txt'
ds = xr.open_dataset(f)
c = Config(config_file)
xs = XarrayStream(ds, time='time', lat='latitude', lon='longitude')
qc_results = xs.run(c)
collected_list = collect_results(qc_results, how='list')

for cl in collected_list:
    flag_results = cl.results.data
    flag_results

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 3, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])

ds.conductivity.values
array([0.     ,     nan, 0.     ,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan,     nan,     nan,
           nan,     nan,     nan,     nan,     nan, 4.45602,     nan,
       4.45214,     nan,     nan, 4.45171,     nan, 4.45178,     nan,
           nan, 4.45147,     nan, 4.45046,     nan,     nan, 4.45   ,
           nan, 4.45106,     nan,     nan, 4.45116,     nan, 4.45054,
           nan,     nan, 4.45027,     nan, 4.45089, 4.45019,     nan,
           nan, 4.45109,     nan,     nan, 4.4514 ,     nan, 4.45154,
           nan,     nan, 4.45173,     nan, 4.45156,     nan,     nan,
       4.45145,     nan, 4.45162,     nan, 4.4511 ,     nan,     nan,
       4.45092,     nan, 4.45045,     nan,     nan, 4.45007,     nan,
       4.44995,     nan,     nan, 4.44954,     nan, 4.4495 ,     nan,
           nan, 4.44886,     nan, 4.44779,     nan,     nan, 4.44765,
           nan, 4.44805,     nan,     nan, 4.44685,     nan, 4.44496,
           nan,     nan, 4.43886,     nan, 4.43323,     nan,     nan,
       4.43035,     nan, 4.46671,     nan,     nan, 4.52998,     nan,
       4.53362,     nan,     nan, 4.66421,     nan, 4.66618,     nan,
           nan, 4.61894,     nan, 4.54442,     nan,     nan, 4.51362,
           nan, 4.47128,     nan,     nan, 4.38806,     nan, 4.28966,
           nan,     nan, 4.23655,     nan, 4.23101,     nan,     nan,
       4.23322,     nan, 4.2036 ,     nan,     nan, 4.17473,     nan,
       4.16556,     nan,     nan, 4.16569,     nan, 4.16743,     nan,
           nan, 4.15847,     nan, 4.15089,     nan,     nan, 4.14448,
           nan, 4.14326], dtype=float32)

In this example, it looks like when there is only one conductivity value surrounded by missing values within the threshold window, the test flags the one valid conductivity value.

We think we have a modification to qartod.py that checks to make sure there are at least 3 valid data points in the test window:

add line 663:

np.ma.count(window, 1)

modify line 665 (now 666):

test_results = np.ma.filled(np.logical_and(data_range < tolerance, data_count > 2), fill_value=False)

When these lines are added/modified in qartod.py, the result for this example becomes:

array([1, 9, 1, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 1, 9, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1,
       9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9,
       1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9,
       1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9, 9, 1, 9, 1, 9,
       9, 1, 9, 1, 9, 9, 1, 9, 1])