xarray-contrib / cf-xarray

an accessor for xarray objects that interprets CF attributes
https://cf-xarray.readthedocs.io/
Apache License 2.0
155 stars 39 forks source link

A cf-xarray compliance checker? #366

Open kthyng opened 1 year ago

kthyng commented 1 year ago

Would something like this be in scope for cf-xarray? It would need to be fairly loosely defined, but maybe a minimum would be that a Dataset would have axes and coordinates all defined? Variables would need standard_names? Though some variables don't usually have standard names like maybe "angle" on a ROMS grid.

dcherian commented 1 year ago

A number of these exist:

so i don't think we should reinvent it. It would be nice if we could run the checker on a Dataset using ds.cf.check(checker="ioos") for example

cc @ocefpaf

malmans2 commented 1 year ago

For another project I've been looking at CF checkers last week, and it looks like all options are mostly command-line tools meant to check NetCDF files.

It would be great if cf-xarray allows to check any format supported by xarray and datasets that have not been written on disk. I also think it would be great to use other checkers in the backend, but looks like before doing it changes are needed in compliance-checker and cf-checker (i.e., the checkers only accept paths right now, they would have to accept xarray datasets as well).

dcherian commented 1 year ago

It'd be nice to build an API connection, but worst case we can write a tiny dataset with all attributes to /tmp/check.nc and run that, and print the output to screen.

ocefpaf commented 1 year ago

I have mixed feelings. While I don't want to overload cf-xarray with functionalities that exists elsewhere this could be a nice idea b/c:

  1. what @malmans2 said above
  2. compliance-checker is super verbose and sometimes you don't want a full CF check, just a bare bones "what is missing so I can plot this automatically, or load this data into analysis X." In a way, iris used to be like that but has become more and more restrictive with time.

I guess that, instead of becoming a compliance-checker cf-xarray could have a "verbose mode" where all the compliance issues would be printed when loading a dataset.

dcherian commented 1 year ago

"what is missing so I can plot this automatically, or load this data into analysis X."

This is hard to define!

ocefpaf commented 1 year ago

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names
  3. enough variables defined to compute say z for example

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

kthyng commented 1 year ago

I wrote some tests for a package: https://github.com/NOAA-ORR-ERD/model_catalogs/blob/main/model_catalogs/tests/test_catalogs.py#L326-L369

When the models are read in with the package, they should be able to be used by cf-xarray in a basic way. I am finding I need this functionality again so that is when I thought it could be useful in cf-xarray itself. It could warn a user if no axes or coordinates are known for a Dataset/Array, and which data_vars do not have standard_names. I also like the connection @ocefpaf said for being able to calculate z.

dcherian commented 3 months ago

NASA-specific compliance checker: https://github.com/eugenegesdisc/diwg-data-compliance-test

DWesl commented 2 months ago

This is hard to define!

Indeed! That is why cc is super verbose, kind of all or nothing. However, @kthyng suggestion above looks like a nice start:

  1. axes and coordinates
  2. valid standard_names

I'd suggest allowing long_names as an option, for those variables that aren't in the standard name table yet. You can add a warning pointing to the forum for adding standard names if you want to discourage long_name without standard_name.

  1. enough variables defined to compute say z for example

Everything mentioned in formula_terms or similar, at a guess? Or do you want enough information to convert from the model vertical coordinate to a geometric vertical coordinate?

More than that we would get into the weeds of CF but those 3 lines ensure almost all of plotting with labels.

I'd suggest a fourth check for units: it's possible to guess from values, but I like having that explicitly