Modularisation: the Submodel concept

hydpy-dev / hydpy

A framework for the development and application of hydrological models based on Python

GNU Lesser General Public License v3.0

36 stars 16 forks source link

Modularisation: the Submodel concept #90

Open tyralla opened 2 years ago

tyralla commented 2 years ago

We currently aim to support LARSIM-like simulations with HydPy-L that better take infiltration excess into account via a Green & Ampt (GA) infiltration approach. However, HydPy-L is already very complex, and a GA approach suitable for continuous simulations is not too simple either. So, combining the equations of both models is something we should avoid. Instead, we strive for an independently usable GA model that allows for coupling with HydPy-L (and, at best, other "complete" hydrological models like HydPy-H). Unfortunately, this is beyond HydPy's current modularisation functionalities. Extending these functionalities will be non-trivial, but we think it will pay off very soon due to ways to keep our application models smaller and increase the users' flexibility in combining different model modules.

The necessary work will touch on many topics and require many decisions. Hence, we split the discussion into three issues that separately deal with the GA methodology itself (#89), the general approach to couple "complete" application models like HydPy-L with "submodels" like GA (#90), and the specific coupling interface for HydPy-L and GA (#91).

tyralla commented 2 years ago

The application model ga_garto is now implemented as a stand-alone model. We discussed the general aspects of how to couple it with other models before starting with ga_garto. I hope the following short list contains all relevant discussed points:

Technically speaking, we follow the strategy pattern.
We speak of "submodels", a term that conflicts with the existing Submodel class. However, the new submodel concept we target here is far more relevant from the users' perspective. So we decided to leave this name conflict for now as is. Maybe, we need to rename the existing Submodel class and the related documentation later.
When speaking about submodels, "main model" would mean a model controlling a submodel. I think we need this term just for the documentation, not for the source code.
Principally, a submodel can control another submodel and so be a main model and a submodel simultaneously. We do not put any effort into explicitly supporting this, as we are unsure if it will ever be required. However, we also do not implement things in an unnecessarily restricting way.
It should be possible to define the main model and its submodel(s) within a single control file and its initial conditions in a single condition file. Our first idea is to use separate "with" blocks for individual models, which should result in a readable design and help deal with possible name conflicts for the parameters or sequences of the different models involved.
We still need to think about shared information. For example, HydPy-L and HydPy-GA both require soil depth data. For now, users must define it twice, but (if the submodel concept proves successful), we should somehow circumvent this redundancy later.
We postpone any discussion on configuring submodels via the XML support for now.
There is no way around clear interfaces here. And these must be implemented both in Python and (manually or automatically?) in Cython. This should be the first (and possibly hardest) point to solve.

tyralla commented 2 years ago

At the time of writing, #91 defines a SoilInterface prototype semantically. Here, we discuss the general technical implementation of this and other submodel interfaces and their concrete realisations.

Like for "normal" models, we always require two implementations: pure Python classes and Cython extension classes. At least, we should support the automatic generation of Cython realisations based on Python code. Preferably, we should also support the automatic generation of Cython interfaces based on Python code. I start with the optimistic assumption complete automatisation is possible.

tyralla commented 2 years ago

If fully automatic generation is possible, we do not need to write any pyx or pxd files manually. Hence, there is no good reason for defining interfaces in the cythons subpackage. I would prefer to create a new subpackage, called interfaces, for this purpose. That should also help to keep the package structure and the documentation structure in sync. (For users, knowing the available interfaces is much more relevant than the very specific functionalities implemented in the cythons subpackage.)

tyralla commented 2 years ago

Within the interfaces subpackage, there could be modules for different "topics" (for now, only soil.py). The assignment of each interface to a single topic may not always be clear. Still, it currently seems better than collecting all interfaces in a single module (in case the submodel concept is successful and we will implement many interfaces later).

tyralla commented 2 years ago

I prefer to write "real" python modules (.py) of over stub files (.pyi) because we want to load them at least for converting their interfaces to Cython.

tyralla commented 2 years ago

Reusing the "normal" model conversion mechanisms seems favourable. So far, we have the interfaces subpackage including a soil module defining the following interface:

class SoilInterface:
    def set_initial_surface_water(self, k: int, v: float) -> None:
        ...

    def set_actual_surface_water(self, k: int, v: float) -> None:
        ...

    def set_soil_water_supply(self, k: int, v: float) -> None:
        ...

    def set_soil_water_demand(self, k: int, v: float) -> None:
        ...

    def execute_infiltration(self, k: int) -> None:
        ...

    def add_soil_water(self, k: int) -> None:
        ...

    def remove_soil_water(self, k: int) -> None:
        ...

    def get_infiltration(self, k: int) -> float:
        ...

    def get_percolation(self, k: int) -> float:
        ...

    def get_soil_water_addition(self, k: int) -> float:
        ...

    def get_soil_water_removal(self, k: int) -> float:
        ...

    def get_soil_water_content(self, k: int) -> float:
        ...

Speaking of the pure Python mode, we just need to define the corresponding methods in the usual way (derived from modeltools.Method) and select them as "additional methods" (ADD_METHODS). However, adding a new tuple for selecting methods should be the tidier solution if going this route. Therefore, I suggest INTERFACE_METHODS.

Following this idea, we would need to create a GARTO-like GA submodel designed to follow a specific interface. But it is not directly clear whether such a submodel follows the interface (no inheritance). Theoretically, we could think about making the interface class a Protocol. At least, this would hint at its actual usage, but I am unsure if the protocol mechanism (issubclass, issintance) works for cythonized methods.

Maybe we can should a mandatory marker for each interface. We could name our first soil interface SoilInterface1 instead of SoilInterface (perhaps a good idea anyhow) and define it so:

from typing import Literal
class SoilInterface1:
    mark: Literal[1]

Then, the main model would require no isinstance check the identify the interface supported by the model selected by the user. But, on the downside, it requires casting in Cython (I am confident we can automate it quickly), and each application model would relate to a single interface only, even if providing enough methods for supporting multiple interfaces.

Maybe the last point is more an advantage than a disadvantage because it could prevent using submodels in cases that work technically but are hydrologically misleading. A GARTO submodel would, for example, also satisfy an interface that does not require calculating percolation. But GARTO calculates percolation in any case. So if the main model does not consider this calculated percolation, it will likely make serious (water balance) errors.

The code from the main model's perspective could then look like this:

SupportedSoilInterfaces: Literal[SoilInterface1, SoilInterface2]

model.soilmodel: SupportedSoilInterfaces

if model.soilmodel is None:
    model.apply_standard_soil_routine()
elif model.soilmodel.mark== 1:
    model.apply_soilsubmodel1()
elif model.soilmodel.mark == 2:
   model.apply_soilsubmodel2()
else:
    assert_never(model.soilmodel.mark)

Hence, we need to declare somewhere for each model which submodel interfaces it supports, which is also helpful for users. The assert_never approach helps to update the main model's code as soon as we manage to type-check our models with Mypy. In Cython, we can eliminate the assert_never line because we should check that a given submodel follows a supported interface already during initialisation. If (always) providing a standard routine as a fallback is a good idea, I am not sure.

Converting the Python implementation of an interface into a Cython implementation (.pxd) should pose no big problem. If we somehow connect a cythonized submodel with such a cythonized interface, we hopefully get type-checking for free (not during programming but at least during compiling). I have to try, but I guess that simple inheritance should work here. Of course, we must ensure the interfaces are cythonized first, even when starting cythonizing a single submodel from the interpreter.

tyralla commented 2 years ago

After discussing the details of the current proposal: Maybe we can avoid the "marks" and use the standard isinstance approach instead. I do not know what type-checkers and linters say, but we could make the interface's methods abstract without making the interface an abstract class and add the class to the submodel's inheritance tree. The abstract methods would be "replaced" by concrete Python- or Cython-methods during model instantiation.

from abc import abstractmethod

class SoilInterface:
    @abstractmethod
    def set_initial_surface_water(self, k: int, v: float) -> None:
        ...

SupportedSoilInterfaces: Literal[SoilInterface1, SoilInterface2]

model.soilmodel: Optional[SupportedSoilInterfaces]

if model.soilmodel is None:
    model.apply_standard_soil_routine()
if isinstance(model.soilmodel, SoilInterface1):
    model.apply_soilsubmodel1()
if isinstance(model.soilmodel, SoilInterface2):
   model.apply_soilsubmodel2()
else:
    assert_never(model.soilmodel)

It looks similar but a little cleaner and would allow further checks (e.g. in apply_soilsubmodel1) to differentiate between potential sub-interfaces (e.g. SoilInterface1A and SoilInterface1B) or so if such a subclassing makes sense hydrologically (we will see).

However, we need to remember not to inherit interface SoilInterface2 from interface SoilInterface1, even if all methods of SoilInterface1 are also defined by SoilInterface2, but both are hydrologically incompatible, as discussed above.

tyralla commented 2 years ago

Base model lland provides a "switch method" Calc_BoWa_V1, its default implementation Calc_BoWa_Default_V1, and (so far) the handler of the interface Calc_BoWa_SoilInterface1_V1. Application model lland_v2 selects all these methods.

Base model ga provides the different interface methods (e.g. Set_InitialSurfaceWater_V1). The submodel (I would say, submodels are always application models) ga_garto_sub1 selects the appropriate (currently all) interface methods.

How much testing is necessary from the hydrological perspective, and where do we do it?

Calc_BoWa_SoilInterface1_V1
- Detailed testing that lland uses the specific interface correctly (by selecting a random submodel following this interface).
- At least, we need to clarify we keep the water balance under all circumstances.
lland_v2
- Short demonstration of how to use a submodel following this interface (by selecting a random submodel following this interface).
ga_garto_sub1
- Semi-detailed testing by repeating the most striking examples of the related stand-alone model ga_garto (by selecting a random main model using its interface).

tyralla commented 2 years ago

After discussing the details of the current proposal: Maybe we can avoid the "marks" and use the standard isinstance approach instead. I do not know what type-checkers and linters say, but we could make the interface's methods abstract without making the interface an abstract class and add the class to the submodel's inheritance tree. The abstract methods would be "replaced" by concrete Python- or Cython-methods during model instantiation.

Unfortunately, I cannot get the prefered isinstance approach running. Cython generally supports it, but not when releasing Python's Global Interpreter Lock (GIL) via the nogil option, which we still want to do for the sake of not risking unnecessary speed bottlenecks and not blocking later (potential) parallelisation efforts.

Hence, the first lines of our Python interface now look like this:

class SoilModel_V1(modeltools.SubmodelInterface):

    typeid: Literal[1] = 1

    @abstractmethod
    def set_initialsurfacewater(self, k: int, v: float) -> None:
        ...

I now use typeid instead of mark, which seems clearer. By convention, the integer value of typeid (1) corresponds to the "version number" of the interface ("V1").

In Python, ga_garto_sub1 inherits from this soil model interface:

class Model(modeltools.AdHocModel, soilinterfaces.SoilModel_V1):
    """The GARTO algorithm (assuming a hydrostatic groundwater table), implemented as
    a submodel meeting the requirements of the |SoilModel_V1| interface."""
    ....

In Cython, we derive the application model from the (cythonized) interface only:

@cython.final
cdef class Model(soilinterfaces.SoilModel_V1):
    cdef public int idx_sim
    ...

The base extension class is now defined in two modules. First, in a pxd module (for declaring the types):

cdef class SoilModel_V1:
    cdef numpy.int32_t typeid
    cdef void add_soilwater(self, numpy.int32_t k) nogil
    ...

Second, in a pyx module (for implementing the implementation):

cdef class SoilModel_V1:
    cdef void add_soilwater(self, numpy.int32_t k) nogil:
        pass
    ...
    cdef double get_infiltration(self, numpy.int32_t k) nogil:
        return 0.0
    ...
    def __init__(self):
        self.typeid = 1

I don't like that we define useless default method implementations in pyx, but I have no idea how to circumvent it. However, it's no real problem, as they are automatically generated and users won't be able to access them (accessible via C only).

tyralla commented 2 years ago

Each submodel inherits from a single interface only and selects the suitable methods via the new class attribute INTERFACE_METHODS:

class Model(modeltools.AdHocModel, soilinterfaces.SoilModel_V1):
    """The GARTO algorithm (assuming a hydrostatic groundwater table), implemented as
    a submodel meeting the requirements of the |SoilModel_V1| interface."""

    INLET_METHODS = ()
    RECEIVER_METHODS = ()
    RUN_METHODS = (ga_model.Perform_GARTO_V1,)
    INTERFACE_METHODS = (
        ga_model.Set_InitialSurfaceWater_V1,
        ga_model.Set_ActualSurfaceWater_V1,
        ga_model.Set_SoilWaterDemand_V1,
        ga_model.Execute_Infiltration_V1,
        ga_model.Remove_SoilWater_V1,
        ga_model.Get_Percolation_V1,
        ga_model.Get_Infiltration_V1,
        ga_model.Get_SoilWaterRemoval_V1,
        ga_model.Get_SoilWaterContent_V1,
    )
    ADD_METHODS = (
        ga_model.Return_RelativeMoisture_V1,
    ...

I make INTERFACE_METHODS a member of modeltools.SubmodelInterface so that models which cannot be submodels do not need to define empty INTERFACE_METHODS tuples.

Main models which support using specific submodel interfaces declare so by selecting them under the new class attribute SUBMODELINTERFACES:

class Model(modeltools.AdHocModel):
    """Base model for HydPy-L-Land."""
    ...
    SUBMODELINTERFACES = (soilinterfaces.SoilModel_V1,)
    ...

Similar to the usual "method selection" mechanism, such an "interface selection" implies there is an instance attribute named submodel (which we generate automatically and set to None during model initialisation). Assume the following situation:

class Model(modeltools.AdHocModel):
    ...
    SUBMODELINTERFACES = (
        soilinterfaces.SoilModel_V1,
        soilinterfaces.SoilModel_V3,
        soilinterfaces.InterceptionModel_V1,
    )

For such a setting, the main model would have an attributed named soilmodel that allows submodels following the interfaces SoilModel_V1 and SoilModel_V3 and an attribute named interceptionmodel that must follow InterceptionModel_V1 (or be set to None).

For documentation and consistency checking purposes, the methods relying on SoilModel_V1 should also select this class under the new (optional) class attribute SUBMODELINTERFACES.

tyralla commented 2 years ago

Each submodel inherits from a single interface only and selects the suitable methods via the new class attribute INTERFACE_METHODS:

I make INTERFACE_METHODS a member of modeltools.SubmodelInterface so that models which cannot be submodels do not need to define empty INTERFACE_METHODS tuples.

Combining these ideas does not work for base models that might support multiple application models following different interfaces. But I think we could derive them from SubmodelInterface directly, implying to users that (some of) the application models derived from the considered base model support at least one submodel interface.

tyralla commented 2 years ago

For documentation and consistency checking purposes, the methods relying on SoilModel_V1 should also select this class under the new (optional) class attribute SUBMODELINTERFACES.

Not a good idea because it would be unclear how the base model class should override the abstract class attribute typeid.

tyralla commented 2 years ago

After discussing the details of the current proposal: Maybe we can avoid the "marks" and use the standard isinstance approach instead. I do not know what type-checkers and linters say, but we could make the interface's methods abstract without making the interface an abstract class and add the class to the submodel's inheritance tree. The abstract methods would be "replaced" by concrete Python- or Cython-methods during model instantiation.

Unfortunately, I cannot get the prefered isinstance approach running. Cython generally supports it, but not when releasing Python's Global Interpreter Lock (GIL) via the nogil option, which we still want to do for the sake of not risking unnecessary speed bottlenecks and not blocking later (potential) parallelisation efforts.

Using typeid as a surrogate for type checking, I see no way to prevent explicit casting in Cython. The relevant Python code of the mentioned switch function of lland now looks like this:

        if model.soilmodel is None:
            model.calc_bowa_default_v1()
        elif model.soilmodel.typeid == 1:
            model.calc_bowa_soilmodel_v1(
                cast(soilinterfaces.SoilModel_V1, model.soilmodel)
            )

cast is the standard casting function from Python's typing library (which does nothing except help static type checkers). The conversion routines translate this to the following Cython code:

        if self.soilmodel is None:
            self.calc_bowa_default_v1()
        elif self.soilmodel.typeid == 1:
            self.calc_bowa_soilmodel_v1(<soilinterfaces.SoilModel_V1>self.soilmodel)

Using casting in a "hydrological" method is not really desired, but at least it clarifies the role of typeid.

Passing the casted instance to the submethod Calc_BoWa_SoilModel_V1 prevents performing multiple casts in submethods. Therefore, I now allow custom types as method arguments and return types in the following form:

    @staticmethod
    def __call__(model: modeltools.Model, submodel: soilinterfaces.SoilModel_V1 ) -> soilinterfaces.SoilModel_V1:

To support casting, I needed to define a common Cython base type for all interfaces (interfaceutils.BaseInterface). The soilmodel attribute of the cythonized model class now follows this base type.

So far, I did not encounter increased risks for segfaults or the like. However, we should have an eye on it.

tyralla commented 2 years ago

A linter issue: Following the current design, interfaces like SoilModel_V1 define abstract methods. However, the concrete submodel classes do not override them, as this happens dynamically during model initialisation (either with methods implemented in Python or Cython, depending on the current configuration). Therefore, pylint emits an abstract-method error. For now, I just silenced it but left a reminder (# pylint: disable=abstract-method # ToDo: is there a way to help pylint here?).

tyralla commented 1 year ago

The standard time series functionalities do not work for submodels. So, I will introduce the following decisions and adjustments.

In Python, the new property submodels provides access to all currently available submodel instances:

class Model:
    ...
    @property
    def submodels(self) -> Tuple[Submodel, ...]:
        ...
    ...

To collect and update time series data during simulations (implementation detail), we can iterate through the returned tuple, e.g.:

class Model:
    ...
    def load_data(self) -> None:
        if self.sequences:
            self.sequences.load_data(self.idx_sim)
        for submodel in self.submodels:
            submodel.sequences.load_data(submodel.idx_sim)
    ...

Reading time series data from or writing time series data to disk can now be triggered either from the relevant Element instance or the Model instance itself. Calling the Element methods triggers the corresponding methods of the main model and its submodels. However, calling the Model methods does not trigger the submodel methods. This might not be intuitively clear but easy to remember and avoids introducing further boolean argument options. I take input series reading as an example:

Element(Device):
    ...
    def load_inputseries(self) -> None:
        self.model.load_inputseries()
        for submodel in self.model.submodels:
            submodel.load_inputseries()
    ...

Model:
    ...
    def load_inputseries(self) -> None:
        self.sequences.inputs.load_series()
    ...

Regarding Cython, I could check each relevant attribute directly to determine whether a submodel instance is available. Or I could implement something like a Cython-level get_submodels method. The first approach seems simpler, so I will take that as a start.

tyralla commented 1 year ago

In the last comment, I did not consider "sub-sub-models", e.g. mainmodel.submodel1.submodel2.

I see two options. First, apply recursion by moving the for loop to the Model methods:

Model:
    ...
    def load_inputseries(self, include_subsubmodels: bool = True) -> None:
        self.sequences.inputs.load_inputseries()
        if include_subsubmodels:
            for submodel in self.model.submodels:
                submodel.load_inputseries()
    ...

Second, turn submodels into a method (get_submodels) and optionally allow for recursion:

class Model:
    ...
    def get_submodels(self, include_subsubmodels: bool = True) -> Tuple[Submodel, ...]:
        def _find_submodels(model: Model) -> None:
            name2submodel_new = {}
            for name in set(cls.name for cls in model.SUBMODELINTERFACES):
                submodel = getattr(model, name, None)
                if submodel is not None:
                    name2submodel_new[name] = submodel
            name2submodel.update(name2submodel_new)
            if include_subsubmodels:
                for submodel in name2submodel_new.values():
                    _find_submodels(submodel)

        name2submodel: Dict[str, Submodel] = {}
        _find_submodels(self)
        return tuple(submodel for (name, submodel) in sorted(name2submodel.items()))
    ...

I prefer the second approach.

tyralla commented 1 year ago

We did not discuss interpolation so far. Let us take interpolation potential evapotranspiration (PET) from weather stations to subbasins as an example.

In HydPy 5.0, users have two options. First, they can calculate PET for each station (e.g. with evap_v001) and pass it to an interpolation routine (e.g. conv_v001), which then gives the interpolated values to the "land" models (e.g. lland_v1). Second, they can interpolate the input factors required for calculating PET first (by letting Node instances read the data and pass it to conv_v001) and then let evap_v001 calculate PET for each subbasin. Both approaches work based on the "input/output node mechanism".

Now, we extracted evap_tw2002 from lland_v1, which works as a stand-alone and a submodel (#95). The next step is to implement evap_io, which reads PET and passes it to its main model upon request. Then, the connection of the new version of lland_v1 and evap_io works precisely like the current version of lland_v2. Hence, lland_v2 becomes obsolete, and we can remove it, which is a very pleasing effect of introducing the submodel concept.

However, after removing lland_v2, no LARSIM-like model receives PET as input anymore. Hence, for the first approach, we could apply evap_fao56 or evap_tw2002 as stand-alone models, perform the interpolation as usual, and then would need to connect the outlet nodes of conv_v001 to evap_io via the "input/output node mechanism". That would be one further element in an already complex model chain, but I see no obvious way for simplification. For the second approach, things change less, but it also depends on implementing a way of connecting node sequences with input sequences of submodels. Here, we need to keep the possibility of sub-submodels in mind. (If two submodels use sub-submodels of the same type, they should receive the same interpolated input, don't they?)

One handy aspect of introducing evap_io is that we can use its instances for reading time series data instead of letting nodes read input data. This makes the data requirements of a project more transparent and makes the additional configuration efforts for forcing nodes to read input data and pass it to the interpolation routine vanish. (The second problem mentioned in the initial comment of #94.)

tyralla commented 1 year ago

class Model(modeltools.AdHocModel):
    ...
    SUBMODELINTERFACES = (
        soilinterfaces.SoilModel_V1,
        soilinterfaces.SoilModel_V3,
        soilinterfaces.InterceptionModel_V1,
    )
For such a setting, the main model would have an attributed named soilmodel that allows submodels following the interfaces SoilModel_V1 and SoilModel_V3 and an attribute named interceptionmodel that must follow InterceptionModel_V1 (or be set to None).

This approach is too inflexible because of the reasons discussed in this comment. It seems we need to be more explicit. To stick to the wland example, I suggest the following ("peinterfaces.PEModel_V2" does not exist, I just make it up to clarify the implications):

class Model(modeltools.ELSModel):
    ...
    SUBMODELINTERFACES = (petinterfaces.PETModel_V1, peinterfaces.PEModel_V2)
    ...
    petmodel_land: petinterfaces.PETModel_V1
    petmodel_water: Union[petinterfaces.PETModel_V1, peinterfaces.PEModel_V2]
    ...

tyralla commented 1 year ago

Even more explicit: we can directly use the SubmodelProperty descriptor:

class Model(modeltools.ELSModel):
    ...
    SUBMODELINTERFACES = (petinterfaces.PETModel_V1, peinterfaces.PEModel_V2)
    ...
    petmodel_land = modeltools.SubmodelProperty([petinterfaces.PETModel_V1])
    petmodel_water = modeltools.SubmodelProperty([petinterfaces.PETModel_V1, peinterfaces.PEModel_V2])
    ...

This removes some __init_subclass__ logic and should be easily extensible to a non-scalar approach (SubmodelProperties or so), which will likely soon be required in the context of #98.

tyralla commented 1 year ago

ToDo: individual docstrings of SubmodelProperty instances.

tyralla commented 1 year ago

After implementing this change, the relevant code sections of wland_v001 look like this:

class Model(modeltools.ELSModel):
    ...
    SUBMODELINTERFACES = (petinterfaces.PETModel_V1,)
    ...
    petmodel = modeltools.SubmodelProperty(petinterfaces.PETModel_V1)
    pemodel = modeltools.SubmodelProperty(petinterfaces.PETModel_V1)
    ...

I also added a flag to indicate whether a submodel is required or optional. By default, submodels are considered as required. So far, the only exception is the (additional) soilmodel for calculating infiltration excess of lland:

class Model(modeltools.ELSModel):
    ...
    SUBMODELINTERFACES = (
        petinterfaces.PETModel_V1,
        soilinterfaces.SoilModel_V1,
    )
    ...
    petmodel = modeltools.SubmodelProperty(petinterfaces.PETModel_V1)
    soilmodel = modeltools.SubmodelProperty(soilinterfaces.SoilModel_V1, optional=True)
    ...

So far, the argument optional only clarifies the respective submodel's documentation. However, a safety mechanism for preventing crashes when working in Cython-mode but forgetting to prepare all required submodels still needs to be included. Maybe we can discuss this later in #55.

tyralla commented 1 year ago

We currently deal with sharing land-use information between models in issue #95. Maybe we should make some progress in sharing data between main models and submodels in general before discussing such a specific case.

It should be possible to define the main model and its submodel(s) within a single control file and its initial conditions in a single condition file. Our first idea is to use separate "with" blocks for individual models, which should result in a readable design and help deal with possible name conflicts for the parameters or sequences of the different models involved.

So, let us build a concrete example. Currently, a (shortened) control file for an already refactored version of hland_v1 could look like this:

from hydpy.models.hland_v1 import *

parameterstep("1d")

area(10.0)
nmbzones(2)
sclass(1)
zonetype(FIELD, FOREST)
zonearea(8.0, 2.0)
psi(1.0)
zonez(2.0, 3.0)
... 

from hydpy import prepare_model
model.petmodel = prepare_model("evap_tw2002")
model.petmodel.parameters.control.nmbhru(2)
model.petmodel.parameters.control.hruarea(8.0, 2.0)
model.petmodel.parameters.control.altitude(200.0, 300.0)
model.petmodel.parameters.control.evapotranspirationfactor(1.2)
...

By implementing a method or function that supports the with statement, we could shorten the code significantly:

from hydpy.models.hland_v1 import *

parameterstep("1d")

area(10.0)
nmbzones(2)
sclass(1)
zonetype(FIELD, FOREST)
zonearea(8.0, 2.0)
... 

with model.add_petmodel_v1("evap_tw2002"):
    nmbhru(2)
    hruarea(8.0, 2.0)
    altitude(200.0, 300.0)
    evapotranspirationfactor(1.2)
    ...

(This would require temporarily removing the hland parameters from globals() and adding the evap parameters to globals(). After the with block, this would need reversing. All of this is relatively easy to implement, but we need to keep its effect on the loading time of HydPy projects small.)

The definitions of the number, subarea, and elevation of the individual zones or hydrological response units are redundant. Note that hland and evap use different units for elevation (100 m and 1 m above sea level, respectively).

We could further improve the situation by adding more interface methods. So far, all interface methods deal with the setting, calculating, and getting of data during simulations. We could extend their scope to setting parameter values (and, later, initial conditions):

class PETModel_V1(modeltools.SubmodelInterface):

    typeid: ClassVar[Literal[1]] = 1

    @abstractmethod
    def prepare_numberofzones(self, nmbzones: int) -> None: ...
    @abstractmethod
    def prepare_subareas(self, subareas: Sequence[float]) -> None: ...
    @abstractmethod
    def prepare_elevations(self, elevations: Sequence[float]) -> None: ...

    @abstractmethod
    def determine_potentialevapotranspiration(self) -> None: ...
    @abstractmethod
    def get_potentialevapotranspiration(self, k: int) -> float: ...
    @abstractmethod
    def get_meanpotentialevapotranspiration(self) -> float: ...

However, these "prepare" methods should be pure Python methods. So, we cannot put them into the INTERFACE_METHODS tuple. Instead, simple subclassing seems sufficient. At the moment, we do:

class Model(modeltools.AdHocMod the followingel, petinterfaces.PETModel_V1):
   ...

But we could define one implementation for the PET interface for all evap application models:

class PETModel_V1(modeltools.AdHocModel, petinterfaces.PETModel_V1):

    def prepare_numberofzones(self, nmbzones: int) -> None:
        self.parameters.control.nmbhru(nmbzones)

    def prepare_subareas(self, subareas: Sequence[float]) -> None:
        self.parameters.control.hruarea(subareas)

    def prepare_elevations(self, elevations: Sequence[float]) -> None:
         self.parameters.control.altitude(elevations)  # possibly with a unit-conversion factor

All evap application models following this interface would then just need to subclass the defined implemention:

class Model(evap_model.PETModel_V1):
   ...

Following multiple interfaces should be fine when following this strategy.

Roughly, an implementation of add_petmodel_v1 could look like this:

class Model(modeltools.AdHocModel):
    @contextlib.contextmanager
    def add_petmodel_v1(self, module: Union[str, types.ModuleType]) -> Generator[None, None, None]:
        petmodel = importtools.prepare_model(module)
        try:
            # ToDo: hide hland parameters and make evap parameters globally available
            petmodel.prepare_numberofzones(self.parameters.control.nmbzones.value)
            petmodel.prepare_subareas(self.parameters.control.zonearea.value)
            petmodel.prepare_elevations(self.parameters.control.zonez.value)
            yield
        finally:
            # ToDo: hide evap parameters and make lland parameters globally available 
    ...

This first draft of add_petmodel_v1 does too many things. We could factor out the complex logic (the framework logic) by supplying a specialised decorator.

The discussed approach would remove three redundant code lines from each control file:

...
with model.add_petmodel_v1("evap_tw2002"):
    evapotranspirationfactor(1.2)
    ...

At least two problems remain.

First, marking the "prepare methods" with abstractmethod is undoubtedly a good choice. However, doing so makes it even more apparent that using the same marker for the other methods is confusing (at least for style and type checkers). Maybe we should define something like a modelmethod marker. We could use this marker within our "check consistency" logic for testing if all abstract methods are actually implemented.

Second, what if hland would not have the zonez parameter? Then, hland could still use the PETModel_V1 interface, but one would need to define evap's altitude parameter "manually" within each control file. How to document this? And how to consider all relevant but not redundant parameters when writing control files automatically? Can we introduce some standardisation by using decorators? First draft:

class PETModel_V1(modeltools.AdHocModel, petinterfaces.PETModel_V1):

    @importtools.affected_parameter(evap_control.NmbHRU)
    def prepare_numberofzones(self, nmbzones: int) -> None:
        self.parameters.control.nmbhru(nmbzones)

   @importtools.affected_parameter(evap_control.HRUArea) 
    def prepare_subareas(self, subareas: Sequence[float]) -> None:
        self.parameters.control.hruarea(subareas)

    @importtools.affected_parameter(evap_control.Altitude)
    def prepare_elevations(self, elevations: Sequence[float]) -> None:
         self.parameters.control.altitude(elevations)  # possibly with a unit-conversion factor

class Model(modeltools.AdHocModel):
    ...
    @importtools.prepare_submodel(
        petinterfaces.PETModel_V1.prepare_numberofzones,
        petinterfaces.PETModel_V1.prepare_subareas,
        petinterfaces.PETModel_V1.prepare_elevations,
    )
    def add_petmodel_v1(self, petmodel: petinterfaces.PETModel_V1) -> None:
        petmodel.prepare_numberofzones(self.parameters.control.nmbzones.value)
        petmodel.prepare_subareas(self.parameters.control.zonearea.value)
        petmodel.prepare_elevations(self.parameters.control.zonez.value)
    ...

Looks like much additional work on the framework side but a clean solution on the model side. And all information could be made available somehow to extend the online documentation automatically.

tyralla commented 1 year ago

with model.add_petmodel_v1("evap_tw2002"):
    nmbhru(2)
    hruarea(8.0, 2.0)

One drawback is that there is no way to express what is happening statically. The "trick" of defining stubs with method write_stubfile would not work for temporarily changed globals. Eventually, one could solve this with a Mypy plugin, but I am not even sure if Mypy provides a suitable hook and allows for the necessary manipulations.

Effectively, this means we might never be able to let tools like Mypy check our control files (and, subsequently, condition files). I think this is no major limitation but eventually a relevant restriction for future improvements regarding the automation of checking HydPy projects.

tyralla commented 1 year ago

Sharing control parameter data now works. So far, only for reading, not for writing control files. Also, we need to handle the reading and writing of condition files similarly as soon as we implement a submodel that handles its own condition sequences.

I applied the new mechanisms on HydPy-L (see Base_PETModel_V1) and HydPy-Evap (see Base_PETModel_V1) so far. (Using the same name twice is a little confusing). When clicking on the "source" buttons, one sees the decorators prepare_submodel and define_targetparameter, which hide the complex framework logic from the model developer and the model user. Note that prepare_submodel, in addition to the above considerations, requires the type of the relevant interface as the first parameter. This helps clarify things for type checkers and allows checking if a given submodel complies with the interface.

tyralla commented 1 year ago

Note that prepare_submodel, in addition to the above considerations, requires the type of the relevant interface as the first parameter.

Unfortunately, Mypy does not like passing around abstract classes (python/mypy#5374). So, we need to insert some "type: ignore" comments or hope that someone introduces a flag to disable this check globally.

tyralla commented 1 year ago

Next steps:

[x] Let all application models of HydPy-W(ALRUS) use the new prepare_submodel decorator for submodels that comply with the PETModel_V1 interface.
[x] Extend the SoilModel_V1 interface to avoid redundant control parameter definitions.
[x] Let all application models of HydPy-L(ARSIM) use the new prepare_submodel decorator for submodels that comply with the extended SoilModel_V1 interface.
[x] Use the new mechanisms for writing control files that include (only) the required parameter settings.
[x] Find a solution for sharing land use information between main models and submodels.

tyralla commented 1 year ago

Extend the SoilModel_V1 interface to avoid redundant control parameter definitions

So far, HydPy-L(ARSIM) only sets the number of soil compartments automatically. Let the models share other information seems reasonable, but we still need to gain experience in coupling L to GARTO, so I decided to postpone this. (This "hydrological" topic should be discussed separately, maybe in #91.)

tyralla commented 1 year ago

We recently had a thorough discussion on the "sharing land use information" topic. The ideal situation from the user's and model developer's perspective is now relatively clear. Thinking about technical implementation is next in line. But first, I try to write our ideas down.

We all agreed to keep the neat framework functionalilty to allow all implemented concept models can hold their own land use types and use different names or the same thing.

We try to separate two usages of land use information.

First, the land use type's names or often only little helpers for users to set different parameter values depending on land use type and for model developers to simplify parameter value validation and averaging. An example dealing with the maximum interception capacity of H-Land that does not need to be defined for glaciers and internal lakes:

    >>> from hydpy.models.hland import *
    >>> parameterstep("1d")
    >>> nmbzones(5)
    >>> zonetype(FIELD, FOREST, GLACIER, ILAKE, SEALED)
    >>> icmax(field=2.0, forest=1.0, glacier=4.0, ilake=5.0, sealed=3.0)
    >>> icmax
    icmax(field=2.0, forest=1.0, sealed=3.0)
    >>> icmax(field=2.0, default=8.0, sealed=3.0)
    >>> icmax
    icmax(field=2.0, forest=8.0, sealed=3.0)
    >>> zonearea.values = 1.0, 2.0, nan, nan, 3.0
    >>> from hydpy import round_
    >>> round_(icmax.average_values())
    4.5

Second, in other cases, the land use type of a zone or hydrological response unit selects or deselects certain process equations L-Land adjusts wind speeds to the current leaf area index only for Forests.

For the first topic, we strive for a very general approach on the framework level. Submodels should somehow "dynamically inherit" the land use types of their main models:

    >>> from hydpy.models.hland import *
    >>> parameterstep("1d")
    >>> nmbzones(5)
    >>> zonetype(FIELD, FOREST, FIELD, FOREST, SEALED)
    >>> with model.add_submodel("evap_tw2002"):
    ....     evapotranspirationfactor(field=0.8, forest=1.2)

For the second topic, it seems we need to give more responsibilities to the model developers. A potential submodel must define its own "hard" land use information for selecting or deselecting certain process equations. And the main model must translate its own land use information (which is "soft" from the submodel's perspective") to the land use types of the submodel. All this must become part of the pure-python side of the corresponding interfaces.

These two issues are not perfectly separable regarding parameter value validation, which must be based on the "hard" submodel land use types.

tyralla commented 1 year ago

I think a good start is to focus on the base class KeywordParameter2D, which we use to define model parameters that supply different values depending on land use type and month. In the case of W-Land, we derive the parameter CPETL from it, which adjusts reference (grass) evapotranspiration to land use-specific potential evapotranspiration. FLn of L-Land offers the same functionality. I would start with W-Land, as it is a little less complicated.

Factoring out this functionality is a big step towards extracting MORECS from L-Land. But we need one other decision. We could move the parameter from W-Land and L-Land to all existing Evap application models. Or we could define a new Evap submodel following a new PETModel interface that is only responsible for converting reference grass evapotranspiration to land use-specific potential evapotranspiration and uses the already existing Evap models as submodels for calculating the first. Regarding flexibility, the latter approach would be superior, of course. But it would result in the possibility of defining three submodel levels as soon as the Meteo models follow the submodel concept (e.g. lland_v1.evap_new.evap_tw2002.meteo_v001).

tyralla commented 1 year ago

I tend to think that decoupling is the better option. Adding a "LanduseMonthFactor" parameter to each Evap application model seems like overkill for typical applications of models like GR4J and maybe even HBV. Additionally, there are cases where we apply different submodels for estimating potential evapotranspiration and evaporation. For example, W-Land now uses a separate Evap submodel to calculate the surface water storage's scalar potential evaporation. So, a pure "MonthFactor" parameter would provide the same functionality. And one would not need to introduce an artificial "surface water land use" type just to meet the requirements of a "LanduseMonthFactor" parameter.

tyralla commented 1 year ago

The technical implementation of "sharing land use information" is now relatively settled. It should be in the master branch tomorrow. The technical details are pretty complex. Maybe we can find some simplifications later. However, here is the current state from the model developer's perspective.

There is a new submodel interface method called "share_configuration". Submodels as those of the Evap family can implement it:

class Sub_PETModel_V1(modeltools.AdHocModel, petinterfaces.PETModel_V1):
    """Base class for HydPy-Evap models that comply with the |PETModel_V1| submodel
    interface."""

    @staticmethod
    @contextlib.contextmanager
    def share_configuration(
        sharable_configuration: SharableConfiguration,
    ) -> Generator[None, None, None]:
        with evap_control.HRUType.modify_constants(
            sharable_configuration["landtype_constants"]
        ), evap_control.LandMonthFactor.modify_rows(
            sharable_configuration["landtype_constants"]
        ), evap_parameters.ZipParameter1D.modify_refindices(
            sharable_configuration["landtype_refindices"]
        ), evap_parameters.ZipParameter1D.modify_refweights(
            sharable_configuration["refweights"]
        ), evap_parameters.ZipParameter1D.modify_refweights(
            sharable_configuration["refweights"]
        ), evap_sequences.FactorSequence1D.modify_refweights(
            sharable_configuration["refweights"]
        ), evap_sequences.FluxSequence1D.modify_refweights(
            sharable_configuration["refweights"]
        ):
            yield

Each submodel is free to use any data the main model offers for sharing. The different "modify" class methods serve to manipulate specific class attributes of the respective (base) classes temporarily. They ensure instantiated parameter or sequence objects persistently reference specific main model data via instance attributes. All these methods are implemented within the core framework tools, so model developers do not need to know much about their functionalities.

When defining a main model, one does not need to define an additional method. Instead, one can pass additional arguments to the prepare_submodel decorator:

class Base_PETModel_V1(modeltools.AdHocModel):
    """Base class for HydPy-L models that support submodels that comply with the
    |PETModel_V1| interface."""

    petmodel: modeltools.SubmodelProperty

    @importtools.prepare_submodel(
        petinterfaces.PETModel_V1,
        petinterfaces.PETModel_V1.prepare_nmbzones,
        petinterfaces.PETModel_V1.prepare_zonetypes,
        petinterfaces.PETModel_V1.prepare_subareas,
        landtype_constants=lland_constants.CONSTANTS,
        landtype_refindices=lland_control.Lnk,
        refweights=lland_control.FHRU,
    )
    def add_petmodel_v1(self, petmodel: petinterfaces.PETModel_V1) -> None:
...

The SubmodelAdder applies the share_configuration method during a complete "with" block, including nested "with" blocks. Hence, a main model can even share data with a sub-submodel (if the submodel in-between does not overwrite the shared data). An example that uses HydPy-W the main model:

>>> from hydpy.models.wland_v001 import *
>>> parameterstep("1d")
>>> nu(3)
>>> al(9.8)
>>> as_(0.2)
>>> nu(3)
>>> lt(FIELD, CONIFER, SEALED)
>>> aur(0.6, 0.3, 0.1)
>>> with model.add_petmodel_v1("evap_mlc"):
...     landmonthfactor.sealed = 0.7 * 0.9
...     landmonthfactor.conifer = 1.3 * 0.9
...     landmonthfactor.field[1:4] = 0.73, 0.77, 0.95
...     with model.add_retmodel_v1("evap_io"):
...         evapotranspirationfactor(sealed=1.0, conifer=1.0, field=0.9)

The following specification seems to cover all currently targeted model functionalities:

class SharableConfiguration(TypedDict):
    """Specification of the configuration data that main models can share with their
    submodels."""

    landtype_constants: Optional[parametertools.Constants]
    """Land cover type-related constants."""
    soiltype_constants: Optional[parametertools.Constants]
    """Soil type-related constants."""
    landtype_refindices: Optional[parametertools.NameParameter]
    """Reference to a land cover type-related index parameter."""
    soiltype_refindices: Optional[parametertools.NameParameter]
    """Reference to a soil type-related index parameter."""
    refweights: Optional[parametertools.Parameter]
    """Reference to a weighting parameter (probably handling the size of some 
    computational subunits like the area of hydrological response units)."""

So, SharableConfiguration clearly defines some "meta-model logic". However, most of the logic existed already implicitly, so we do not lose much freedom when introducing new model concepts.

In the above example, the Evap parameter evapotranspirationfactor is connected to the HydPy-W parameter lt. That is a "soft", optional connection (evapotranspirationfactor knows the relevant land cover types just for user convenience). However, the Evap parameter landmonthfactor must know the types of the individual hydrological response units even during simulation. This information is provided by the new Evap parameter HRUType. As usual, one can set its value manually. But that is not necessary in the above example, as the main and the submodel both make use of the new PETModel_V1 interface method prepare_zonetypes:

    @importtools.define_targetparameter(evap_control.NmbHRU)
    def prepare_zonetypes(self, zonetypes: Sequence[int]) -> None:
        if (hrutype := getattr(self.parameters.control, "hrutype", None)) is not None:
            hrutype(zonetypes)

Many particular functionalities (like averaging) should already be working but still need more thorough testing. For example, the automatic writing of control files is still open.

tyralla commented 1 year ago

We decided on allowing submodels referencing their main models as "sub-submodels", as suggested here. I implemented most of it, and now the submodel evap_hbv96 can query precipitation from its main model if it complies with the new PrecipModel_V1 interface. Alternatively, evap_hbv96 can use a "real" submodel following the PrecipModel_V2 interface. For this, currently, only meteo_precip_io is available.

The model preparation features enable such "feedback" by default. Hence, we do not need to specify in our LahnH control files that the main model hland_v1 is also a submodel for evap_hbv96:

from hydpy.models.hland_v1 import *
from hydpy.models import evap_hbv96

simulationstep("1h")
parameterstep("1d")

area(692.3)
...
tcalt(0.6)
with model.add_petmodel_v1(evap_hbv96):
    airtemperaturefactor(0.1)
    altitudefactor(0.0)
    precipitationfactor(0.02)
    evapotranspirationfactor(1.0)
ered(0.0)
...

This new feedback mechanism comes with a risk of incompatibilities. A submodel could query a property like precipitation before the main model calculated it. We must introduce additional model consistency checks that are either static (meaning they ensure no incompatible model combination is possible) or dynamic (meaning they check the concrete selected model combinations during runtime). The first situation (full compatibility) is favourable but may not be reachable for our growing collections of main models and submodels.

tyralla commented 1 year ago

Use the new mechanisms for writing control files that include (only) the required parameter settings.

It seems to work well now for "normal" and "auxiliary" files. So far, I checked it for hland_v1 in combination with evap_hbv96. Checks that include sub-submodels are still missing.

tyralla commented 1 year ago

One unexpected problem that needed to be solved:

Unfortunately, Cython extension classes do neither allow multiple inheritance nor support any other way to let one type follow multiple interfaces (at least, to my knowledge). So, we came up with the ugly solution of introducing a single MasterInterface in Cython that collects all methods of the original pure-Python interfaces. We hope there is not too much risk this hack will result in later problems because it is quite well hidden from framework users. However, when Cython improves at this point, we should resume our old path.

tyralla commented 1 year ago

And a relatively related one:

I realised that the interfaces' constant typeid properties were insufficient to let main models identify submodels when they could follow multiple interfaces. So I added the SubmodelTypeIDProperty descriptor, which works similarly to the SubmodelIsMainmodelProperty. Maybe we should find a way to combine the submodel, the typeid, and the submodel_is_mainmodel information in one attribute later.

tyralla commented 1 year ago

After adjusting a huge number of tests to the LahnH example project, which now relies on the combination of hland_v1 and evap_hbv96, we can be relatively confident that many submodel features are functional. Possible next steps:

[x] Search actively for "special functionalities" which require further testing and possibly fixing. (The only thing I am currently aware of is the plotting of time series.)
[ ] Extend the model-related consistency checks.
[x] Find ways to document possible main model-submodel combinations automatically.

tyralla commented 1 year ago

An additional requirement that became apparent when working on evap_aet_hbv96: we need consistency checks that consider the correct order of using submethods and interface methods (see the last sentence of this comment).

tyralla commented 1 year ago

Also: I implemented a "default behaviour" for the interface methods prepare_nmbzones, prepare_zonetypes, prepare_subareas, and prepare_elevations. This lets the "prepare something" approach resemble a little bit more the share_configuration approach. Does this indicate a need for later optimisations (that might become clearer when we have more example implementations)?

tyralla commented 1 year ago

We are near to finishing extracting the actual evapotranspiration routines from all application models of hland and lland into submodels (see #109 and #111). However, I encountered a constellation on the last metres, indicating our interface definitions could need further simplification.

evap_aet_hbv96 and evap_minhas currently comply with the following interface:

class AETModel_V1(modeltools.SubmodelInterface):
    """Interface for calculating actual evapotranspiration values in three steps,
    starting with interception evaporation, continuing with soil evapotranspiration,
    and finishing with water evaporation.

    Note that this order matters because, for example, the calculation of the
    evaporation from water areas might depend on properties previously calculated
    during estimating interception evaporation or soil evapotranspiration.
    """

    typeid: ClassVar[Literal[1]] = 1

    @abc.abstractmethod
    def prepare_water(self, water: VectorInputBool) -> None:
        """Set the flags for whether the individual zones are water areas or not."""

    @abc.abstractmethod
    def prepare_interception(self, interception: VectorInputBool) -> None:
        """Set the flags for whether interception evaporation is relevant for the
        individual zones."""

    @abc.abstractmethod
    def prepare_soil(self, soil: VectorInputBool) -> None:
        """Set the flags for whether soil evapotranspiration is relevant for the
        individual zones."""

    @abc.abstractmethod
    def prepare_maxsoilwater(self, maxsoilwater: VectorInputFloat) -> None:
        """Set the maximum soil water content."""

    @modeltools.abstractmodelmethod
    def determine_interceptionevaporation(self) -> None:
        """Determine the actual interception evaporation."""

    @modeltools.abstractmodelmethod
    def determine_soilevapotranspiration(self) -> None:
        """Determine the actual evapotranspiration from the soil."""

    @modeltools.abstractmodelmethod
    def determine_waterevaporation(self) -> None:
        """Determine the actual evapotranspiration from open water areas."""

    @modeltools.abstractmodelmethod
    def get_interceptionevaporation(self, k: int) -> float:
        """Get the selected zone's previously calculated interception evaporation in
        mm/T."""

    @modeltools.abstractmodelmethod
    def get_soilevapotranspiration(self, k: int) -> float:
        """Get the selected zone's previously calculated soil evapotranspiration in
        mm/T."""

    @modeltools.abstractmodelmethod
    def get_waterevaporation(self, k: int) -> float:
        """Get the selected zone's previously calculated water area evaporation in
        mm/T."""

evap_morsim defines some other parameters, which can be wholly determined by lland_v3 and lland_v4 and partly determined by lland_v1 and all hland models. Among them are two (wilting point and field capacity), which need further consideration. However, the meaning of the parameters Tree and Conifer (flags indicating whether a zone's vegetation is tree- or conifer-like) seem settled. So, in our current thinking, we need to extend AETModel_V1 by introducing AETModel_V2 that defines at least two additional methods:

class AETModel_V2(AETModel_V1):
    """Slightly extended version of |AETModel_V1|."""

    typeid: ClassVar[Literal[2]] = 2

    @abc.abstractmethod
    def prepare_tree(self, tree: VectorInputBool) -> None:
        """Set the flags for whether the individual zones contain tree-like
        vegetation."""

    @abc.abstractmethod
    def prepare_conifer(self, conifer: VectorInputBool) -> None:
        """Set the flags for whether the individual zones contain conifer-like
        vegetation."""

This extension addresses only the interface's "preparation part", not its "simulation part". Hence, simulation methods of the main model could use submodels that comply with AETModel_V1 or AETModel_V2 in the same way. But they do not know that AETModel_V2 extends AETModel_V1 due to avoiding instance checks in Cython. So I am afraid introducing AETModel_V2 would result in some stupid code duplications in the main models' equations and likely in other contexts.

Therefore, I consider making all "prepare methods" non-abstract by removing the abstractmethod decorator. On the downside, this would increase the risk of a model developer missing implementing a "prepare method" he could implement, which would cause suboptimal comfort for model users but could easily be fixed as soon as reported. On the upside, we would keep the complexity smaller and avoid some code duplications.

I tend to think this change is worth trying. The amount of necessary framework or documentation changes should be negligible.

tyralla commented 1 year ago

evap_morsim is the first submodel requiring initial conditions in the form of log sequences. Hence I added the context manager decorated method define_conditions, which provides usability very similar to defining control parameters of submodels. Its implementation is much simpler (for example, because it does not allow automatical information sharing between models), but as far as I can tell, it is currently sufficient.

tyralla commented 1 year ago

Search actively for "special functionalities" which require further testing and possibly fixing. (The only thing I am currently aware of is the plotting of time series.)

I did this when implementing 1-dimensional submodel vectors in #98. As Simone reminded us on the open issue of plotting the time series in #115, we can mark this as sufficiently finished here.

tyralla commented 1 year ago

We recently added class SubmodelsProperty and many related functionalities for handling 1-dimensional vectors of submodels following the same or similar interfaces.

Thereby, we improved many of the already existing submodel functionalities. SubmodelProperty has also been improved in some regards (it is now generic), but SubmodelsProperty has the advantage to bundle related information in one place. It would be favourable to adapt SubmodelProperty relatively soon.

Besides this potential refactoring step, the submodel concept and its implementation have sufficiently evolved. So, we could soon focus on extending the model consistency checks and, most importantly, making all possible model/submodel relations transparent in the online documentation before closing this issue.

tyralla commented 1 year ago

Also: I implemented a "default behaviour" for the interface methods prepare_nmbzones, prepare_zonetypes, prepare_subareas, and prepare_elevations. This lets the "prepare something" approach resemble a little bit more the share_configuration approach. Does this indicate a need for later optimisations (that might become clearer when we have more example implementations)?

The distinction between "general" (with default behaviour) and "normal" (without default behaviour) interface methods was not helpful.

Now, prepare_nmbzones, etc., are not special anymore and must be defined for each interface separately (if required). The relevant data is generally passed from all main models to their sub-submodels for all such interface methods and is only applied if the sub-submodel selects the respective target parameter.

The only drawback I currently see is that the interface of an "intermediate" submodel (e.g. AETModel_V1) must eventually implement a method (prepare_plant) just for the sake of passing that data to a sub-submodel following another interface (e.g. PETModel_V2).

tyralla commented 8 months ago

While working on https://github.com/hydpy-dev/hydpy/issues/111#issuecomment-1954216876, I realised that one cannot write control files with, for example, measuring heights of wind speed that differ for the main model and the submodel. So far, method save_controls always only writes the line(s) where it sets the corresponding main model's parameter value. This should be okay for the wind speed measuring height, but maybe not for other parameters.

I suggest the following: The respective TargetParameterUpdater stores the "input values" (passed to the wrapped "prepare method") and the "result values" (the final target parameter's values). save_controls lets the main model recalculate the "input values" and then compares the "old input values" with the new ones (to see if the relevant aspects of the main model did not change) and also compares the old "result values" with the current target parameter values (to see if the relevant aspects of the submodel did not change). save_control only omits the submodels' parameter value definition line if everything is equal.

I cannot imagine a realistic use case where this heuristic would erroneously skip lines. More likely, it could result in redundant lines if one changes the values of the related parameters identically but individually. If this becomes annoying, we could improve the heuristic by, for example, adding a "force ignore" option.

tyralla commented 8 months ago

While fixing the "skipped line" issue mentioned above, I realised one should call submodel.preparemethod2arguments.clear() manually under rare circumstances. Currently, HydPy does this automatically after adding submodels and before and after writing control files. It is not urgent, but we should refactor this and eventually also add a safety mechanism later. I left a to-do comment in method save_controls of class Model, the most obvious example of the need for later improvements.

tyralla commented 8 months ago

I realised that the tools implemented in calibtools are currently incapable of calibrating the submodel parameters. This should be a straight fix. The only conceptual problem I see is when a main model instance (indirectly) uses multiple submodel instances of the same type. I think we can assume one is usually okay with calibrating the values of the related parameters simultaneously (so that they get the same values). If real-world cases require more flexibility, we can later extend the interfaces to allow for additional specifications.

tyralla commented 6 months ago

We encountered a related problem when implementing GR (#134) with two interfaces for unit hydrograph submodels (#130). I will try to explain from the XML perspective:

    <exchange>
        <setitems>
            <selections>my_selection</selections>
            <gland_gr4>
              ...
            </gland_gr4>
            <rconc_uh>
                <logs>
                    <quh>
                        <name>quh</name>
                        <level>subunit</level>
                        <init>1.0 2.0 3.0</init>
                    </quh>
                </logs>
            </rconc_uh>
        </setitems>
    </exchange>

GR allows selecting two rconc_uh submodels to delay different internal fluxes. Typically, one ("uh1") is shorter than the other unit hydrograph ("uh2"). Due to the flat structure within the setitems block, it is, for example, unclear if the given initial values are thought for "uh1" or "uh2". The selection mechanism generally considers complete elements, including all submodels, not specific ones.

I think we should support specifying complete submodel paths (I do not know if this is the best word, but at least it is not already taken - other suggestions are welcome), like the ones yielded by find_submodels. We could use them like this:

    <exchange>
        <setitems>
            <selections>my_selection</selections>
            <gland_gr4>
                <submodelpath>model</submodelpath>
                ...
            </gland_gr4>
            <rconc_uh>
                <submodelpath>model.rconcmodel1</submodelpath>
                <logs>
                    <quh>
                        <name>uh1</name>
                        <level>subunit</level>
                        <init>1.0 2.0 3.0</init>
                    </quh>
                </logs>
            </rconc_uh>
        </setitems>
    </exchange>

submodelpath could be optional. Leaving it out would start a search among all submodels (the current behaviour). However, we should raise an error when leaving the submodel path specification out in ambiguous situations.

I hope this decision will not affect the HydPy server functionalities much because clients work with exchange item names that can be defined individually. (If we do not already, we should at least check for duplicate custom and default exchange item names.)

However, our internal exchange item logic will be affected. Like the current XML limitation, ExchangeSpecification does not define something like a submodel path. I would suggest adding it as another optional attribute and using it whenever necessary.

submodelpath is a little long. Maybe (at least for the ExchangeSpecification attribute) just path?

tyralla commented 2 months ago

Find ways to document possible main model-submodel combinations automatically.

We implemented this via the class SubmodelGraph and a related sphinx extension introducing the .. submodel_graph:: directive. So far, we use it only at the beginning of the new Model overview section, but we likely will also add model-specific submodel graphs to all user-relevant main models.