ioos / compliance-checker

Python tool to check your datasets against compliance standards
http://ioos.github.io/compliance-checker/
Apache License 2.0
108 stars 58 forks source link

CF-Checker Plugin: Recognize variables with standard_names that enforce an additional dimension #832

Open neumannd opened 4 years ago

neumannd commented 4 years ago

There are some CF standard names that need an additional dimension beyond the temporal and spatial dimensions. These standard names are of one of these types (see Guidelines for Construction of CF Standard Names):

Rule Units Meaning
... ... ...
histogram_of_X[_over_Z] 1 histogram (i.e. number of counts for each range of X) of variations (over Z) of X. The data variable should have an axis for X.
integral_of_Y_wrt_X [X]*[Y] int Y dX. The data variable should have an axis for X specifying the limits of the integral as bounds.
... ... ...
probability_distribution_of_X[_over_Z] 1 probability distribution (i.e. a number in the range 0.0-1.0 for each range of X) of variations (over Z) of X. The data variable should have an axis for X.
probability_density_function_of_X[_over_Z] 1/[X] PDF for variations (over Z) of X. The data variable should have an axis for X.
... ... ...

An examples header using one of these standard names is this one:

netcdf test {
dimensions:
        time = UNLIMITED ; // (248 currently)
        lon = 5 ;
        lat = 5 ;
        mlev = 4 ;
        column = 2 ;
variables:
        double time(time) ;
                time:standard_name = "time" ;
                time:units = "days since 1999-10-01 00:00:00" ;
                time:calendar = "proleptic_gregorian" ;
                time:axis = "T" ;
        double lon(lon) ;
                lon:standard_name = "longitude" ;
                lon:long_name = "longitude" ;
                lon:units = "degrees_east" ;
                lon:axis = "X" ;
        double lat(lat) ;
                lat:standard_name = "latitude" ;
                lat:long_name = "latitude" ;
                lat:units = "degrees_north" ;
                lat:axis = "Y" ;
        double mlev(mlev) ;
                mlev:long_name = "level number" ;
                mlev:units = "1" ;
                mlev:axis = "Z" ;
                mlev:positive = "down" ;
        float histogram_of_column_over_some_parameter(time, mlev, lat, lon, column) ;
                histogram_of_column_over_some_parameter:long_name = "Cloud type (subcolumn)" ;
                histogram_of_column_over_some_parameter:units = "1" ;

// global attributes:
                :Conventions = "CF-1.7" ;
                :history = "test" ;
                :title = "test" ;

If we provide this header to the IOOS Compliance Checker CF Plugin we get:

--------------------------------------------------------------------------------
                         IOOS Compliance Checker Report                         
                                     cf:1.7                                     
http://cfconventions.org/Data/cf-conventions/cf-conventions-1.7/cf-conventions.html
--------------------------------------------------------------------------------
                               Corrective Actions                               
histogram_bad_feature_error_2.cdl has 2 potential issues

                                     Errors                                     
--------------------------------------------------------------------------------
§9.1 Features and feature types
* Unidentifiable feature for variable histogram_of_column_over_some_parameter

                                    Warnings                                    
--------------------------------------------------------------------------------
§2.4 Dimensions
* histogram_of_column_over_some_parameter's dimensions are not in the recommended order T, Z, Y, X. They are time (Unlimited), mlev, lat, lon, column

I don't have a fix yet. If I find some time in the beginning of next week, I will try to provide a PR. After that I will not be available for two month.

neumannd commented 4 years ago

The CF Conventions just recommend (2.4. Dimensions):

All other dimensions should, whenever possible, be placed to the left of the spatiotemporal dimensions.

However, in the IOOS CC CF Plugin this is check as requirement and not as recommendation (regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")):

    def _dims_in_order(self, dimension_order):
        """
        :param list dimension_order: A list of axes
        :rtype: bool
        :return: Returns True if the dimensions are in order U*, T, Z, Y, X,
                 False otherwise
        """
        regx = regex.compile(r"^[^TZYX]*T?Z?Y?X?$")
        dimension_string = "".join(dimension_order)
        return regx.match(dimension_string) is not None

I would suggest to check for "^[^TZYX]*T?Z?Y?X?[^TZYX]*$". However, this might lead to some new issues as some errors might not be captured anymore: if spatio-temporal dimensions are not recognized as such, they might be given in any order.

neumannd commented 4 years ago

Correction. The additional dimension of histogram_ standard names is suggested to be left of TZYX as this example from a mailing list post indicates (by Jonathan Gregory, Fri Oct 14 03:27:22 MDT 2016).

// source variable
  float tair(time,altitude,latitude,longitude);
    tair:units="K";
    tair:standard_name="air_temperature";
    tair:cell_methods="altitude: mean area: mean time: mean";

// resulting probability_density_function_ variable
  float pair(tair,time,altitude);
    pair:standard_name="probability_density_function_of_air_temperature";
    pair:units="K-1";
    pair:cell_methods="altitude: mean time: mean area: sum tair: mean";
    pair:coordinates="latitude longitude"; // to record the ranges

This only leaves the taxon names concept of CF 1.8 to be special case as example 6.1.2 suggests:

  float abundance(time,taxon) ;
    abundance:standard_name = "number_concentration_of_organisms_in_taxon_in_sea_water" ;
    abundance:coordinates = "taxon_lsid taxon_name" ;