ioos / ioos_qc

:ballot_box_with_check: :ocean: IOOS QARTOD and other Quality Control tests implemented in Python
https://ioos.github.io/ioos_qc/
Apache License 2.0
42 stars 27 forks source link

inconsistent in data rendering and in QARTOD flags creation #106

Open leilabbb opened 3 months ago

leilabbb commented 3 months ago

I run into issues when using the qartod package to QC the Glider DAC files. I have seen flags being applied inconsistently, which raises concern. Before I file an issue about that, I run into a new issue when reading the data that I am exposing here for some feedback.

Issues: (1) The salinity data in the netCDF file shows different results when read with different packages. (2) The spike_test and rate_of_change_test flags are questionably using the missing-flag and the suspect-flag on existing or non-erroneous data.

Description: The notebook is a walk-through process to help reproduce the above issues. Files needed to run the notebook: netCDF file configuration file notebook file

Debugging Info Issue (1):

When using netCDF4 to read the salinity array, all values show as nan: [this is the method I use in the Glider DAC QC process] _

with open(nc_path, 'r') as nc:
    nc  = Dataset(nc_path, 'r')
print('number of data point', len(nc.variables['salinity'][:]), '\n', nc.variables['salinity'][:])
number of data points 36 
 [-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
 -- -- -- -- -- -- -- -- -- -- -- --]

When using the xarray to read the salinity array, not all values show as nan: _[this is used in the ioosqc.streams Class XarrayStream]

nd = xr.open_dataset(
                nc_path,
                decode_cf=True,
                decode_coords=True,
                decode_times=True,
                mask_and_scale=True
            )
print('number of data point', len(nd['salinity'].values), '\n', nd['salinity'].values)
number of data point 36 
 [32.80403       nan       nan       nan 32.771416 32.67339  32.671986
       nan       nan 32.636482 32.141727 32.114475       nan       nan
 32.103085       nan 32.122192       nan       nan 32.09773        nan
       nan 32.050667       nan 32.04136        nan       nan 32.010563
       nan 31.924274       nan 31.775118       nan       nan       nan
 31.886553]

When using ncdump to read the salinity array, not all values are masked: [using another method]

!ncdump -v salinity '/Users/leilabelabassi/Downloads/ru34_20220529T202312Z_rt.nc'
salinity = 32.80403, _, _, _, 32.77142, 32.67339, 32.67199, _, _, 32.63648, 
    32.14173, 32.11448, _, _, 32.10308, _, 32.12219, _, _, 32.09773, _, _, 
    32.05067, _, 32.04136, _, _, 32.01056, _, 31.92427, _, 31.77512, _, _, _, 
    31.88655 ;

Debugging Info Issue (2):

The flag selection for the spike_test shows a missing flag (9) applied to an existing data point.

KEYS for the right column: Missing 9 UNKNOWN 2

    salinity
2  32.804031
9        NaN
9        NaN
9        NaN
9  32.771416
1  32.673389
9  32.671986
9        NaN
9        NaN
9  32.636482
1  32.141727
9  32.114475
9        NaN
9        NaN
9  32.103085
9        NaN
9  32.122192
9        NaN
9        NaN
9  32.097729
9        NaN
9        NaN
9  32.050667
9        NaN
9  32.041359
9        NaN
9        NaN
9  32.010563
9        NaN
9  31.924274
9        NaN
9  31.775118
9        NaN
9        NaN
9        NaN
2  31.886553

The flag selection for the rate_of_change_test shows a suspect flag (3) applied to a good data point.

KEYS for the right column: Missing 9 UNKNOWN 2 SUSPECT 3

    salinity
1  32.804031
9        NaN
9        NaN
9        NaN
1  32.771416
3  32.673389
3  32.671986
9        NaN
9        NaN
1  32.636482
3  32.141727
3  32.114475
9        NaN
9        NaN
1  32.103085
9        NaN
1  32.122192
9        NaN
9        NaN
1  32.097729
9        NaN
9        NaN
1  32.050667
9        NaN
1  32.041359
9        NaN
9        NaN
1  32.010563
9        NaN
1  31.924274
9        NaN
1  31.775118
9        NaN
9        NaN
9        NaN
1  31.886553

Clarification: The suspect flag may be acceptable, but I am not completely clear on labeling existent data with a missing flag. Also, the data read from the 3 methods may be unrelated to flagging data but any recommendations or best practices on how to work with the ioos_qc package will be appreciated.

leilabbb commented 3 months ago

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

ocefpaf commented 3 months ago

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

@leilabbb I'm not too familiar with the code and QARTOD but I will take a look as soon as possible.

Can you send the version of the ioos_qc library that you are using?

leilabbb commented 3 months ago

@ocefpaf could you look at the issues i am seeing with the QARTOD application? Thanks

@leilabbb I'm not too familiar with the code and QARTOD but I will take a look as soon as possible.

Can you send the version of the ioos_qc library that you are using?

@ocefpaf Here is the SHA number: 0b1406b52872e564885ccff5808315cf88ae28ee

ocefpaf commented 3 months ago

@leilabbb can you try the latest commit? @iwensu0313 made some improvements and fixed a few bugs that may impact your example. Please let us know what you find.

leilabbb commented 3 months ago

@leilabbb can you try the latest commit? @iwensu0313 made some improvements and fixed a few bugs that may impact your example. Please let us know what you find.

I rerun the code using the latest commits and I do not see any change in the results. The same flags were rendered by the ioos_qc/qartod library.

ocefpaf commented 3 months ago

I guess we need to wait for #108

iwensu0313 commented 1 month ago

108 is merged now, @leilabbb do you want to try and see if that fixed it?

leilabbb commented 4 weeks ago

Not sure what is not working but the merge of 108 did not fix it. @iwensu0313 could use this netCDF file and see if you see a different response.