ioos / ioos_qc

:ballot_box_with_check: :ocean: IOOS QARTOD and other Quality Control tests implemented in Python
https://ioos.github.io/ioos_qc/
Apache License 2.0
42 stars 27 forks source link

write silently drops nc attributes #91

Closed callumrollo closed 1 year ago

callumrollo commented 1 year ago

I'm trying to set up QARTOD QC for glider trajectory netCDF files, similar to the Issue #58 raised by Kerfoot. I would like to read in a netCDF, apply QARTOD flags and write out the netCDF.

The examples with netCDF silently drop all the variables from the variables being flagged, and also drop attributes from the Coordinates. In the example QartodTestExample_netCDF.ipynb the input nc has quite comprehensive atributes:

>> pco2["pco2_seawater"].attrs

{'comment': 'Partial Pressure of CO2 in Seawater provides a measure of the amount of CO2 and HCO3 in seawater. Specifically, it refers to the pressure that would be exerted by CO2 if all other gases were removed. Partial pressure of a gas dissolved in seawater is understood as the partial pressure in air that the gas would exert in a hypothetical air volume in equilibrium with that seawater. NOTE: the following calibration coefficients in the parameterfunctionmap are deprecated: "ea434", "eb434", "ea620", "eb620". Those arguments are still present in the function declaration, but they are completely unused and can be supplied with an arbitrary fill value.',
 'long_name': 'pCO2 Seawater',
 'precision': 4,
 'data_product_identifier': 'PCO2WAT_L1',
 'units': 'µatm',
 'ancillary_variables': 'absorbance_blank_434,thermistor_raw,light_measurements,record_type,absorbance_blank_620'}

After QC is applied, and the result is saved to nc, these attributes have been erased

>> out_c["pco2_seawater"].attrs

{}

It seems odd, as the flag variables are saved with metadata

>>out_c["qartod_qc_rollup"].attrs

{'standard_name': 'aggregate_quality_flag',
 'long_name': 'Aggregate Quality Flag',
 'flag_values': array([1, 2, 3, 4, 9], dtype=int8),
 'flag_meanings': 'GOOD UNKNOWN SUSPECT FAIL MISSING',
 'valid_min': 1,
 'valid_max': 9,
 'ioos_qc_module': 'qartod',
 'ioos_qc_test': 'qc_rollup',
 'ioos_qc_target': ''}

Is this desired behaviour? Is there a way to retain attributes in variables when saving to nc?

ocefpaf commented 1 year ago

@callumrollo the part in the code that writes the data is in https://github.com/ioos/ioos_qc/blob/121620a2ac8955885db0c7b0a06e8b6b95a41ac3/ioos_qc/stores.py#L157-L240

However, as I read it, I believe that the API there is sane enough b/c you are creating a new file with new old+data and new attributes. It would be the the end user responsibility to re-write the original attributes that still apply into that dict. Maybe Kyle disagrees, let's wait for his input.

With that said, the first option, to add the QA/QC to an xarray Dataset, would preserve everything and you can save a netCDF file from it. Would that work for you?

PS: IMO we should not even have saving mechanism when libraries like xarray exists. We should outsource functionality to it and only return the QA/QC results.

ocefpaf commented 1 year ago

Also, looks like at the end the xarray from_dataframe method is used to pass the attributes but that method doesn't take an attribute (at least not in the latest xarray version). So all these attributes are not passed anyway :-/


Edit: Nope, it is from pocean-core.

callumrollo commented 1 year ago

Thanks Filipe. I've opted to just use Option A to run the QC routines manually and handle the netCDF writing via xarray. imo the example notebooks should not recommend the .write method in Option C.

ocefpaf commented 1 year ago

mo the example notebooks should not recommend the .write method in Option C.

Indeed those notebooks require a much an overhaul. We need users, like you, and a doc sprint to start addressing that.

callumrollo commented 1 year ago

Should've seen that one coming :P I expect I'll end up addressing #58 in the next few weeks. I can write it up into a nice notebook

ocefpaf commented 1 year ago

Should've seen that one coming :P I expect I'll end up addressing #58 in the next few weeks. I can write it up into a nice notebook

If we ever meet in person I'll pay in :beer: or :coffee:.

kwilcox commented 1 year ago

Also, looks like at the end the xarray from_dataframe method is used to pass the attributes but that method doesn't take an attribute (at least not in the latest xarray version). So all these attributes are not passed anyway :-/

That isn't an xarray.from_dataframe, it's using a DSG classes' from_dataframe method from pocean. The class you want is passed in as an argument to the save method. These do take an NCO-JSON compliant dict and applies it to netCDF file.

kwilcox commented 1 year ago

I'm open for the API to be whatever users want it to be! The reason the code was split into streams and stores is to we can access data from anywhere and save it anywhere... we just need to implement the classes as they become use-cases. I will say that I use PandasStore for collecting the QC results and manually write the data out to different formats as needed.

CFNetCDFStore was meant to write to a new file NetcdfStore was meant to append/update an existing file

I can see a use for XarrayStore that returns an in-memory (or lazy) xarray.Dataset of the QC results that a user can then choose to combine with another xarray.Dataset of the original data. If they have the same coords it's a one-liner in xarray would be a really clean API.

ocefpaf commented 1 year ago

I can see a use for XarrayStore that returns an in-memory (or lazy) xarray.Dataset of the QC results that a user can then choose to combine with another xarray.Dataset of the original data. If they have the same coords it's a one-liner in xarray would be a really clean API.

I like this idea.

That isn't an xarray.from_dataframe, it's using a DSG classes' from_dataframe method from pocean. The class you want is passed in as an argument to the save method. These do take an NCO-JSON compliant dict and applies it to netCDF file.

I saw that right after I wrote the comment.