ioos / compliance-checker

Python tool to check your datasets against compliance standards
http://ioos.github.io/compliance-checker/
Apache License 2.0
109 stars 58 forks source link

Compliance checker is not handling CF 1.9 files correctly, and is complaining about CF 1.8 issues that are not correct #1093

Closed mjbrodzik closed 3 months ago

mjbrodzik commented 4 months ago

Version of compliance checker running: 5.0.0

Describe the checker this affects: CF

Attach a minimal CDL or NetCDF file which is able to reproduce the issue

To Reproduce: Uploaded attached file, and ran the checker NSIDC0630_GRD_EASE2_N25km_GCOMW1_AMSR2_E_10.7H_20230627_v2.0.nc.gz using CF-1.8, since it is the most recent one to select.

Describe the issue below: A number of the reported issues are complaining about issues that should be updated for CF 1.9, specifically:

1) "??2.2 Data Types

CF 1.9 now allows unsigned types.

2) "??3.5 Flags

We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value. (This doesn't appear to be a difference between CF 1.8 and CF 1.9)

3) "??5.6 Horizontal Coordinate Reference Systems, Grid Mappings, Projections

We note that CF1.8 and CF1.8 both say: "To geo-reference data horizontally with respect to
the Earth, a grid mapping variable may be provided by the data variable, using
the grid_mapping attribute. If the coordinate variables for a horizontal grid
are not longitude and latitude, then a grid_mapping variable provides the
information required to derive longitude and latitude values for each grid
location. If no grid mapping variable is referenced by a data variable, then
longitude and latitude coordinate values shall be supplied in addition to the
required coordinates. For example, the Cartesian coordinates of a map projection
may be supplied as coordinate variables and, in addition, two-dimensional
latitude and longitude variables may be supplied via the coordinates attribute
on a data variable."

Since our data are projected, we interpreted this to mean that we could include x
and y as the coordinate variables (in projected meters), and together with the
"grid_mapping" attribute on each variable, this precluded the requirement to
also include 2D latitude/longitude coordinate variables. We think the checker is
not correctly interpreting the convention in this case.

4) "??2.6 Attributes

Our references global attribute contains a non-empty string, so we think this
might be a bug in the checker.

5) "??4.1 Latitude Coordinate

We think this is a further misunderstanding on the checker's understanding of the convention on horizontal
geo-referencing. Since the data are projected, the y(x) variable units have nothing to do with latitude(longitude) units.

6) "??8.1 Packed Data

We think this is a problem with the checker expecting not to find unsigned
types, which is legitimate for CF1.8. See above, CF1.9 allows unsigned types.

hot007 commented 4 months ago

Related to this, I found https://github.com/ioos/compliance-checker/issues/972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please? We have some files being written now to CF1-10 and even 1-11 standard and we can't check them.
Thanks!

benjwadams commented 4 months ago

Related to this, I found https://github.com/ioos/compliance-checker/issues/972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please? We have some files being written now to CF1-10 and even 1-11 standard and we can't check them. Thanks!

Per recent interest in continuing development, and realignment of project funds the answer is soon. CF 1.9 and 1.10 don't add a tremendous number of features versus some earlier releases. If you are interested in assisting the process, please create a separate issue with a file and detail what you'd expect to have checked in 1.9. Certain sections such as https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling currently have a dearth of existing files and examples we can check against, complicating interpretation of that section.

benjwadams commented 4 months ago

Regarding OP's post here, this mostly looks to be a data type issue which should be fairly straightforward to fix.

benjwadams commented 4 months ago

We do actually check for unsigned int support here:

https://github.com/ioos/compliance-checker/blob/develop/compliance_checker/tests/test_cf.py#L3141-L3148

I'll dig to figure out what's going on.

mwengren commented 4 months ago

Related to this, I found https://github.com/ioos/compliance-checker/issues/972 which indicated support for CF1-9 and 1-10 was on the horizon back in March 2023 but the current checker version (v5.1.1) still stops at CF1-8, can any of the devs give us a hint when support for the newer CF versions might be available, please? We have some files being written now to CF1-10 and even 1-11 standard and we can't check them. Thanks!

Per recent interest in continuing development, and realignment of project funds the answer is soon. CF 1.9 and 1.10 don't add a tremendous number of features versus some earlier releases. If you are interested in assisting the process, please create a separate issue with a file and detail what you'd expect to have checked in 1.9. Certain sections such as https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#compression-by-coordinate-subsampling currently have a dearth of existing files and examples we can check against, complicating interpretation of that section.

@benjwadams in https://github.com/ioos/compliance-checker/issues/972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

We should make sure those projects are in alignment and if we don't already have CF 1.9 supported in the latest CC release, issue a new release in the near future that does support it, even if the compression section isn't implemented (assuming most everything else in 1.9 already is).

ocefpaf commented 4 months ago

@benjwadams in #972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

Micah, this page is built using the README and that will be updated when #1098 is merged.

benjwadams commented 4 months ago

Compliance checker web is also not linked to the latest release of Compliance Checker via GitHub Actions, so it needs to be built against Compliance Checker manually.

benjwadams commented 4 months ago

??2.6.2 references global attribute should be a non-empty string "

Hmm, you have an array of strings:

>>> ds.references
['Long, D. G. and M. J. Brodzik. 2016. Optimum Image Formation for Spaceborne Microwave Radiometer Products. IEEE Trans. Geosci. Remote Sensing, 54(5):2763-2779, doi:10.1109/TGRS.2015.2505677.\n', 'Algorithm Theoretical Basis Document:  https://nsidc.org/sites/nsidc.org/files/technical-references/MEaSUREs_CETB_ATBD.pdf\n']

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#description-of-file-contents mentions:

The attribute values are all character strings.

I'll have to clarify this with CF group. We don't usually get descriptive values in global attributes with >= 2ary array elements.

mwengren commented 4 months ago

@benjwadams in #972 you indicated that CF 1.9 was supported as of a recent release at that time, but the Compliance Checker website still lists 1.8 as the latest CF release supported.

Micah, this page is built using the README and that will be updated when #1098 is merged.

Thanks @ocefpaf. In my reply above, I had been looking at the Compliance Checker Web website (https://compliance.ioos.us), but as you point out our README needs an update as well. Looks good!

ocefpaf commented 4 months ago

Thanks @ocefpaf. In my reply above, I had been looking at the Compliance Checker Web website (https://compliance.ioos.us), but as you point out our README needs an update as well. Looks good!

The web checker probably needs to be updated and redeployed before a new release is issued.

Back when we still used setup.py we did not had an entrypoint for CF 1.9. See https://github.com/ioos/compliance-checker/pull/1024/files#diff-60f61ab7a8d1910d86d9fda2261620314edcae5894d5aaa236b821c7256badd7L58

Since then we moved to pyproject.toml and #1097 added the CF 1.9 entrypoiint. That is why we need a new release first.

@benjwadams is that something you can do?

benjwadams commented 4 months ago

Yes, I will tag a new release.

mjbrodzik commented 4 months ago

??2.6.2 references global attribute should be a non-empty string "

Hmm, you have an array of strings:

>>> ds.references
['Long, D. G. and M. J. Brodzik. 2016. Optimum Image Formation for Spaceborne Microwave Radiometer Products. IEEE Trans. Geosci. Remote Sensing, 54(5):2763-2779, doi:10.1109/TGRS.2015.2505677.\n', 'Algorithm Theoretical Basis Document:  https://nsidc.org/sites/nsidc.org/files/technical-references/MEaSUREs_CETB_ATBD.pdf\n']

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#description-of-file-contents mentions:

The attribute values are all character strings.

I'll have to clarify this with CF group. We don't usually get descriptive values in global attributes with >= 2ary array elements.

Hi, @benjwadams , it's so great to see the action to address my issues, thank you very much. Have you received any clarification on this question? I've been using this approach (making the attribute a list of strings when there are multiple references that are to be included) for many years (several ongoing data sets), and I'd like to know if I need to change it. Just last week on a new data set, a reviewer asked me if it is a legitimate approach. So I'd be interested in the answer. I may only be able to correct new data sets going forward, but I can also notify colleagues at the NSIDC DAAC to keep alert for this potential problem.

benjwadams commented 3 months ago

We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value

This has come up a few times and in fact I thought this was addressed for flag_values. NetCDF seems to represent underlying attributes as arrays, but NetCDF-Python will store a single valued array as a scalar value if I recall correctly.

benjwadams commented 3 months ago

Version of compliance checker running: 5.0.0

You are running an older version of compliance checker.

"??3.5 Flags TB_num_samples's flag_values must be an array of values not <class 'numpy.uint8'>" We only included a single flag value, and we are not sure why the checker is requiring this to be a list and not allowing it to be a single value. (This doesn't appear to be a difference between CF 1.8 and CF 1.9)

This behavior was fixed in 6ee802e3afec1ed72976d2867fac01820d0f3775 . It has to do with how netCDF4-Python treats single element arrays due to the netCDF data model as far as I can tell -- assigning a single element array or scalar to an attribute returns a scalar when the attribute is queried.

benjwadams commented 3 months ago

Please abridge and/or strikeout superfluous issues after running against newer version of compliance checker.

benjwadams commented 3 months ago

https://cfconventions.org/Data/cf-conventions/cf-conventions-1.8/cf-conventions.html#longitude-coordinate mentions

Optionally, the longitude type may be indicated additionally by providing the standard_name attribute with the value longitude, and/or the axis attribute with the value X.

A similar case follows for axis "Y" being treated as longitude variables.

This is why these are being detected as lon/lat variables:

https://github.com/ioos/compliance-checker/blob/develop/compliance_checker/cfutil.py#L549-L563

benjwadams commented 3 months ago

I have to trace a little more, but "true" latitude/longitude variables are detected using the method above, and thus are likely related to the errors being raised of the form:

Incidence_angle's coordinate variable "y" is not one of the variables identifying true latitude/longitude and its dimensions are not a subset of Incidence_angle's dimensions

benjwadams commented 3 months ago

https://github.com/cf-convention/discuss/issues/340 Current discussion seems to have arrived at the consensus that global attributes taking string should be one string only. However, I could put a warning in that arrays of string in NetCDF4-Python should instead be a single string.

mjbrodzik commented 3 months ago

Thanks for this discussion, I'll pass it along to NSIDC DAAC and hopefully they can get the recommendation out further afield.

benjwadams commented 3 months ago

Closing issue. Please abridge remaining points into separate issues.

mjbrodzik commented 3 months ago

I'm a little confused @benjwadams . I was assuming that I'd be able to go online and get a pull-down for CF 1.9, then try my file again and see what may be left. But when I go online, I'm still only seeing 1.8 and prior versions. So I'm not sure how to verify what changes have been made. Please let me know what to expect for next steps to verify the changes.