unifhy-org / unifhy

A Unified Framework for Hydrology
https://unifhy-org.github.io/unifhy
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Aggregate in `DataSet` for fields with no standard_name fails #52

Closed ThibHlln closed 2 years ago

ThibHlln commented 2 years ago

The automatic aggregatation of cf.read works on the fields standard_name/id attributes, which means that fields with only long_name attributes will not be aggregated as expected, which will result in an error in cm4twc saying that timedomain (or spacedomain) of dataset is not compatible with timedomain (or spacedomain) of component even if they legitimately are.

This is because the cf.read in DataSet will return separate fields for the given variable featuring only a long_name attribute, which will result in each of these separate fields overwriting the previous field assigned to this key (i.e. the long_name) in DataSet. Only a subset of the timedomain and/or spacedomain will remain mapped in DataSet, which corresponds to a single file (the last one listed in the cf.FieldList with the given long_name).

In an ideal world, all fields used by cm4twc components would have an entry in the CF standard name list (and so a standard_name), but this is no the case yet, so we need to accommodate for such situation.

There is a simple fix to this problem readily available in cf-python, this is the relaxed_identities of cf.aggregate (https://github.com/NCAS-CMS/cf-python/blob/53dba98531e5255275da4f54cea1fcc5c291aff4/cf/aggregate.py#L1470-L1473) that can be passed through from cf.read using a dictionary for its aggregate argument.