ioos / ioos_qc

:ballot_box_with_check: :ocean: IOOS QARTOD and other Quality Control tests implemented in Python
https://ioos.github.io/ioos_qc/
Apache License 2.0
46 stars 27 forks source link

Netcdf improvements #27

Closed jessicaaustin closed 4 years ago

jessicaaustin commented 4 years ago

note: i based this off of gen_agg_take2, so that would be merged into master before this PR. and if the approach for generating the aggregate flag changes, then this branch would need updates as well

review-notebook-app[bot] commented 4 years ago

Check out this pull request on  ReviewNB

You'll be able to see Jupyter notebook diff and discuss changes. Powered by ReviewNB.

jessicaaustin commented 4 years ago

note: i based this off of gen_agg_take2, so that would be merged into master before this PR. and if the approach for generating the aggregate flag changes, then this branch would need updates as well

jessicaaustin commented 4 years ago

@kwilcox could you please review?

In particular, these are some changes I'd love to get feedback on:

kwilcox commented 4 years ago

I updated the NcQcConfig constructor to add arguments to indicate the name of the dimensions in the netcdf file, this is useful so you don't have to extract that data out yourself when calling run() -- it can just use the data from the proper variables in the netcdf file.

I like it, wish this could be represented in the NcQcConfig file as well, but for now this is helpful. Would also be nice to default to the CF convention (extract the time and z axes if they are not provided).

In save_to_netcdf, I added a fill value attribute to the qc variable. Seems reasonable?

There was some discussion if a flag value of UNKNOWN should be filled/masked. Do you remember where the convo happened and what the outcome was? Would you assume that if the quality flag was UNKNOWN it should be masked? The opposing side would be to only mask the quality flag when the data was masked (we do that now). Right now: data input mask and the quality flag mask are always be equal. After PR: the quality flag mask would include input data mask and all UNKNOWN).

In save_to_netcdf, I removed valid_min and valid_max attributes. Does it make sense to have those for a list of flags anyway? (as opposed to a data variable which is continuous)

Yeah, this is still useful. Some tools use it for color bar bounds and other things. They could pull the info from flag_values, but this makes it more generic. I'll add back in unless you object.

To store the proper standard_name with the variable, I added an annotation, add_metadata, to each of the test functions

This is cool, works great for the purpose.

jessicaaustin commented 4 years ago

I updated the NcQcConfig constructor to add arguments to indicate the name of the dimensions in the netcdf file, this is useful so you don't have to extract that data out yourself when calling run() -- it can just use the data from the proper variables in the netcdf file.

I like it, wish this could be represented in the NcQcConfig file as well, but for now this is helpful. Would also be nice to default to the CF convention (extract the time and z axes if they are not provided).

Agreed. I'll add an issue for that to track it.

In save_to_netcdf, I added a fill value attribute to the qc variable. Seems reasonable?

There was some discussion if a flag value of UNKNOWN should be filled/masked. Do you remember where the convo happened and what the outcome was? Would you assume that if the quality flag was UNKNOWN it should be masked? The opposing side would be to only mask the quality flag when the data was masked (we do that now). Right now: data input mask and the quality flag mask are always be equal. After PR: the quality flag mask would include input data mask and all UNKNOWN).

Hmmmm I see what you mean. When you put it that way, I think "data input mask and the quality flag mask are always be equal" is the correct behavior. So I will revert my changes.

In save_to_netcdf, I removed valid_min and valid_max attributes. Does it make sense to have those for a list of flags anyway? (as opposed to a data variable which is continuous)

Yeah, this is still useful. Some tools use it for color bar bounds and other things. They could pull the info from flag_values, but this makes it more generic. I'll add back in unless you object.

Makes sense, thanks for the explanation.