unifhy-org / unifhy

A Unified Framework for Hydrology
https://unifhy-org.github.io/unifhy
BSD 3-Clause "New" or "Revised" License
11 stars 5 forks source link

Gather 'driving'/'ancillary' data into a single 'inputs' item #15

Closed ThibHlln closed 3 years ago

ThibHlln commented 3 years ago

resolve #7

The definition of a Component used to distinguish between 'driving_data' and 'ancillary_data', but this distinction was rather ambiguous (where would climatology data fit in?), community-specific (mostly UM world?), and limited (no time dimension allowed for ancillary).

A component is now defined by just one item 'inputs' for the data given to it. Each input must be given a 'units' metadata (as was already the case) and a 'kind' metadata (newly added). The 'kind' can be:

The definition of a component's inputs would look like this:

inputs_info = {
    'rainfall': {
        'units': 'kg m-2 s-1',
        'kind': 'dynamic'
    },
    'elevation': {
        'units': 'm above sea level',
        'kind': 'static'
    },
    'leaf_area_index': {
        'units': '1',
        'kind': 'climatologic',
        'frequency': 'monthly'
    }
}

The distinction of 'inputs' into kinds allows for some checks on the compatibility between the data given and what the component needs. For 'dynamic' a full space and time check can be done, for 'static' a space check can be done (a time dimension may or may not exist, but if it does, it must be of size one), and for 'climatologic' a space check can be done alongside a check on the length of the time dimension compared to the expectation.

ThibHlln commented 3 years ago

Thank you @rich-HJ

I wonder whether we need to implement tighter time checks for 'climatologic' kind of data? At the moment, it checks for the right number of values available along the time dimension, but nothing else.

For example, when 'frequency': 'seasonal', it will check for 4 values available, but it does not check if it is MAM-JJA-SON-DJF, or another order. Likewise, when 'frequency': 'monthly', it will check for 12 values, but it does not check if a calendar year, a meteorological year, or a hydrological year is considered.

The alternative to tighter checks would be to choose a standard (i.e. default order/start of the year) for seasonal/monthly for HJ, and document it as a requirement somewhere. Maybe this is a non-problem and datasets out there are always following the same order for the seasons, and the same start for a year of climatology? As far as I could see in the CF-conventions, nothing is enforced in that regard, but since 'time' and 'time_bounds' are required for the climatology data, they do not need a standard.

rich-HJ commented 3 years ago

I think it is fine to assume seasonal is the meteorological definition. DJF, MAM, JJA and SON. If people want as different definition they should have to expertise to implement it.

rich-HJ commented 3 years ago

As for calendar. Do we need to allow met, hyd? I would stay with calendar to begin with and add options if there is great demand.

ThibHlln commented 3 years ago

This sounds reasonable to me (i.e. expecting meteorological seasons and calendar year). I am going to document that in the docstring of Component for the argument dataset.

ThibHlln commented 3 years ago

Another question.

To support other frequencies (e.g. the MODIS 10-day LAI), I've added support for a datetime.timedelta in frequency. This infers the length of the time dimension by using the floor division of 366 days by this timedelta (giving the number of full sub-periods of length timedelta), and then by adding one if the remainder of the division is not 0 (to cover the last sub-period of length less than timedelta).

But this whole process assumes a 'gregorian' calendar (because this is what datetime is based on). But the TimeDomain of the component could be in another calendar, which is not very consistent.

So maybe asking for an integer in place of a timedelta is better? But e.g. timedelta(days=10) should be 37 in a gregorian calendar, but only 36 in a 360-day calendar.

Not sure what is best here, and what we should support.

ThibHlln commented 3 years ago

I dropped the support for timedelta for now. I replaced it by a support for an integer value if 'seasonal', 'monthly', or 'day_of_year' are not enough. The framework will check that the =the length of the time dimension in the dataset corresponds to this integer value.