Implementing GARTO - Githubissues

tyralla commented 2 years ago

We currently aim to support LARSIM-like simulations with HydPy-L that better take infiltration excess into account via a Green & Ampt (GA) infiltration approach. However, HydPy-L is already very complex, and a GA approach suitable for continuous simulations is not too simple either. So, combining the equations of both models is something we should avoid. Instead, we strive for an independently usable GA model that allows for coupling with HydPy-L (and, at best, other "complete" hydrological models like HydPy-H). Unfortunately, this is beyond HydPy's current modularisation functionalities. Extending these functionalities will be non-trivial, but we think it will pay off very soon due to ways to keep our application models smaller and increase the users' flexibility in combining different model modules.

This necessary work will touch on many topics and require many decisions. Hence, we split the discussion into three issues that separately deal with the GA methodology itself (#89), the general approach to couple "complete" application models like HydPy-L with "submodels" like GA (#90), and the specific coupling interface for HydPy-L and GA (#91).

tyralla commented 2 years ago

The application range of the original Green & Ampt infiltration method is somewhat limited. One assumption is that heavy rainfall occurs in a single period without interruptions, which is invalid for continuous hydrological simulations. Hence, we need to implement one of the available GA extensions. After some pretests, we decided on (first) implementing (parts of) the GARTO method. Like other Green & Ampt with Redistribution (GAR) approaches, GARTO supports the simulation of multiple wetting fronts (of current and past rainfall events) and the redistribution of already existing fronts in the absence of rainfall. Thankfully, two of its authors, Fred L. Ogden and Cary A. Talbot, provided their source code (written in C), which helped a lot in developing the first Python-based prototype.

GARTO allows including interactions with a shallow groundwater table. I think, at least for now, there is no need for us to implement this functionality as well.

The last decision brings up the topic of finding a good name. First, I thought about naming the new model family HydPy-Inf or similar. However, the name inf of the corresponding base model would collide with NumPy's constant for infinity. More importantly, I think it might prove a bad idea to group Green & Ampt-like methods and other infiltration methods due to the propagation of a variable number of wetting fronts requiring a very different implementation of "normal" state-space-based approaches (relying on the water content of a fixed number of buckets). So, I propose the more precise model family name HydPy-GA.

Possible names for the new application model (both from the documentation and source code perspective):

HydPy-GA (RTO) / ga_rto
HydPy-GA (TO) / ga_to
HydPy-GA (GARTO) / ga_garto

I would favour HydPy-GA (GARTO). The name is a little redundant but still nicely short. ga_garto also reads well. However, the name is an exaggeration, for we do not implement the shallow dynamic water table (for now). My first idea for clarifying the current HydPy implementation only models surface water infiltration (and neglects groundwater rise) is HydPy-GA (GARTO-S) / ga_gartos.

The analysis performed by the authors lets us expect that GARTO is computationally much faster than numerical approximations of Richards' original equation. However, we can be sure it is computationally much more demanding than simple scalar bucket approaches. For now, I will strictly follow the provided C source code and implement a fixed numerical step size (explicit Euler method) without any error control. With promising results but too long computation times for our projects, we could revisit this decision later and implement at least simple adaptive error control (e.g. halved stepsize after tolerance violations) or possibly use a more efficient solver (which, I am afraid, would require special handling of the numerous discontinuities, resulting in a massive amount of work; maybe discontinuity locking?).

tyralla commented 2 years ago

The variables' names should be self-explanatory. My suggestions:

control parameters
- NmbSoils (For defining multiple soil compartments in which independent calculations occur. From the GARTO perspective, this option seems unnecessary and is not supported by the C source code. However, it should simplify coupling to models like HydPy-L, which structure subbasins into hydrological response areas.)
- NmbBins (In the C code, it is numbins, but in HydPy, we nearly always use the abbreviation nmb.)
- SoilArea (For calculating the average surface runoff and groundwater recharge from all soil compartments.)
- DT (Like in the C code. However, I use the parameter step size instead of a fixed single second for its unit. This is a little complicated but more consistent with the definitions of the other time-dependent variables.)
- SoilDepth (The C code defines bottom_top and bottom_depth. As long as we do not implement layers (not done in the C implementation either), a single variable should be enough.)
- ResidualMoisture (theta_r or residual_saturation in the C code. ResidualWaterContent is a possible alternative but is longer, harder to read in lower case letters, and (in my eyes) would fit better to a volume than a fraction.)
- SaturationMoisture (theta_s or *porosity. See comments above.)
- SaturatedConductivity (Like in the C code.)
- PoreSizeDistribution (The C code uses bc_lambda, with "bc" standing for Brooks-Corey.)
- AirEntryPotential (The C code uses bc_psib, with "bc" standing for Brooks-Corey.)
derived parameters
- NmbSubteps (Inverse of DT, for convenience.)
- RelativeSoilArea (For convenience.)
- EffectiveCapillarySuction (Like in the C code.)
inputs sequences
- Rainfall (I think when applying the model "stand-alone", we will, in fact, nearly always apply it to rainfall series.)
- Evapotranspiration (The C code differentiates between evaporation and transpiration. I did not read the source code for the transpiration part in detail, but it seems to be somehow restricted to a separately defined root zone. However, as we are interested in coupling to HydPy-L, which only considers the rooted part of the soil, it seems okay to me to combine them.)
state sequences
- Moisture (Fits to ResidualMoisture and SaturationMoisture. The C code chooses surface_front_theta. However, as long as we do not model "groundwater fronts", we can neglect the prefix surface. Also, I think "front" does no fit to my terminology, because "filled bins" contain no front but possess a moisture value; see below.)
- FrontDepth (Instead of surface_front_depth, see above.)
flux sequences
- Infiltration (The main functions of the C code do not calculate infiltration.)
- GroundwaterRecharge (Like in the C code.)
- SurfaceRunoff (Not calculated in the C code.)
- TotalGroundwaterRecharge (The average groundwater recharge from all soil compartments.)
- TotalSurfaceRunoff (The average surface runoff from all soil compartments.)
aide sequences
- ActualSurfaceWater (C code: surfacewater_depth)
- InitialSurfaceWater (C code: surfacewater_depth_old)
log sequences
- MoistureChange (Called d_theta the C code. I double-checked: it is actually the moisture change over time, not the velocity of the front depth.)

Note that following this suggestion, ActualSurfaceWater and InitialSurfaceWater are two aide sequences instead of one state sequence called SurfaceWater. The idea is not to mix interception evaporation losses with infiltration losses. At the start of each numerical substep, rainfall becomes surface water. Next, infiltration removes as much water as possible from this reservoir. In case of infiltration excess, the remaining water becomes surface runoff. Hence, ActualSurfaceWater becomes zero at the end of each numerical substep. If we want to allow a certain ponding level, we must do so with the help of another model. For the desired coupling with HydPy-L, this means that GARTO is responsible for calculating the infiltration and the (potential) surface runoff only and that HydPy-L would model the ponding (or use another submodel for this purpose).

Despite being an input factor, I try to handle Evapotranspiration as actual evapotranspiration instead of potential evapotranspiration. This way, the user or the calling model can decide on the method for adjusting evapotranspiration to the actual soil moisture.

(What about other soil water losses, e.g. due to lateral outflow (interflow)? Just handle them as additional evapotranspiration? Or find a better, more general name for Evapotranspiration? To be discussed in issue #91.

Evapotranspiration is not discussed in the GARTO paper but has been added to the C code later, which allows for taking evaporation before infiltration or from surface water by uncommenting some code lines. I think we should decide on the option that is computationally most efficient and/or allows for the easiest coupling with HydPy-L. Also to be discussed in #91.

tyralla commented 2 years ago

I will leave parameter DT as a primary control parameter for now. Later, if we decide on an adaptive integration algorithm, we can move it to the more hidden solver parameters or even remove it from users' access.

For now, its trim method enforces that a multiple of DT fits into the simulation time step without remainder.

tyralla commented 2 years ago

I will name the soil and bin index variables s and b in all methods (local variables).

tyralla commented 2 years ago

IMPORTANT: I am afraid GARTO might recalculate Conductivity multiple times for the same bin and numeric time step. Hence, a single call of Calc_Conductivity_V1 might not be enough. A added the arguments s and b to avoid unnecessary recalculations. However, additional calls to Calc_Conductivity_V1 increase complexity. I postpone checking if recalculations actually occur (and if, in which situations) until all integration tests are ready.

tyralla commented 2 years ago

One point puzzling me: I calculate the "dry depth" using

$$ DryDepth = \frac{\tau + \sqrt{\tau^2 + 4 \cdot \tau \cdot EffectiveCapillarySuction}}{2} $$

with (local variable)

$$ \tau = DT \cdot \frac{SaturatedConductivity}{SaturationMoisture - Moisture} $$

and (derived parameter)

$$ EffectiveCapillarySuction = AirEntryPotential \cdot \frac{3 \cdot PoreSizeDistribution + 2}{3 \cdot PoreSizeDistribution + 1}. $$

I think this agrees with the C code. However, the paper uses $G(\theta_i, \theta_s)$ in place of $EffectiveCapillarySuction$ (equation 7) and defines it as follows (the $\theta \geq \theta_s$ branch of equation 13):

$$ G(\theta_i, \theta_s) = \psi_b \cdot \frac{3 \cdot \lambda+ 2 - S_e^{3 + 1 / \lambda}(\theta_i)}{3 \cdot \lambda+ 1} $$

$$ S_e(\theta_i) = \frac{\theta - \theta_r}{\theta_s - \theta_r} $$

Does this mean that the C code always calculates the dry depth for the residual moisture ( $G(\theta_r, \theta_s)$ ) instead of (current?) the initial moisture ( $G(\theta_i, \theta_s)$ )? Or do other differences between the code and paper offset this difference?

Maybe this relates to the question of what to do when the moisture of the most-left bin changes (for example, due to evaporation). Still need to think this through...

Does this mean that the C code always calculates the dry depth for the residual moisture ( $G(\theta_r, \theta_s)$ ) instead of (current?) the initial moisture ( $G(\theta_i, \theta_s)$ )?

No, not really: we still use the moisture of the leftmost bin for calculating $\tau$.

tyralla commented 2 years ago

All vectors have the length $NmbBins + 1$ (or $numbins + 1$). The first entry handles the initial condition (the initial soil moisture, which is the same over the complete soil depth). The other entries represent the "bins". Each bin is either empty (having the same moisture value as the first entry) or can contain a wetting front (having a moisture value higher and a front depth value smaller than its "left" neighbour). The moisture signals if a bin is "active" (contains a wetting front) or not. Hence, I assume we must update all (inactive) bins when removing water from the first entry. Maybe, the same holds when adding water to the first entry (after merging fronts).

Based on these considerations:

Why not put nan-values in all inactive bins? I think this would clarify each bin's current state and relevance.

I am not sure about the terms I should use in the documentation. The paper is a little more abstract and does not use words like "bin". And the C code is not entirely consistent, as it names the first entry "saturated bin" (also "completely saturated domain"). "Saturated" also seems a little misleading, as we might be speaking of the driest part of the moisture domain. I will try it with the following naming conventions:

A bin can either be active or inactive.
Inactive bins are irrelevant for the current calculations.
Active bins can either be filled (uniform initial soil moisture over the complete soil depth or directly after a wetting front has reached the soil bottom) or contain a wetting front that is moving downwards.
I will speak of filled bins, wetting front bins, and inactive bins for brevity.

tyralla commented 2 years ago

While fixing a few bugs in my code, I encountered the following difference to the C implementation (I hope, I use it correctly):

grafik

The hourly rainfall values: 4.0, 0.0, 5.0, 0.0, 0.0, 3.0, 0.0 [cm/h] The constant evaporation value 0.2 cm/h (unrealistically high, just for testing) No transpiration

Everything else is pretty standard, except that I set the soil depth to only 0.1 m.

In Python, rain infiltrates completely (and partly becomes evaporation and groundwater recharge).

For the heaviest event, C runs into t_o_advance_front (due to saturation) while Python continues using gar_infiltrate_redistribute (as fas as I debugged it).

This brings up the questions, why the states of both implementations evolve differently (maybe $\tau = (dt \cdot K_s) / (\theta_s - \theta_i)$ plays a role due to numerical inaccuracies, see also the discussion above), and if the Python implementation might also simulate such sharp transitions at times (at least, when applied on shallow soils)? What is the desired behaviour here?

A related question I did not think of so far: If I am correct, shallow soils should always show higher infiltration rates than deep soils with otherwise equal characteristics, because their infiltration fronts cannot become that large, don't they? Do we have to think about restricting the bottom's permeability later for coupling with HydPy-L?

tyralla commented 2 years ago

A related question I did not think of so far: If I am correct, shallow soils should always show higher infiltration rates than deep soils with otherwise equal characteristics, because their infiltration fronts cannot become that large, don't they?

Not necessarily, as GARTO does neglect capillary drive for the filled bin (I think due to the soil water front touching the ground water). We need to decide on a proper lower boundary assumption for coupling with HydPy-l later.

tyralla commented 2 years ago

While fixing a few bugs in my code, I encountered the following difference to the C implementation (I hope, I use it correctly):

After lots of debugging, I found the reason for the different behaviour. It relates to removing water from the filled bin due to evaporation. In the C-implementation, the moisture of the bins stays as is (possibly saturated). In the current Python implementation, I set their moisture values to the new first bin's value. Both operations do not introduce any water volume errors due to dealing with the relative moisture and zero front depth. However, in the next numerical substep, the C-implementation selects the method for pure front shifts (_t_o_advancefront) due to saturation, while the Python implementation selects the redistribution method (_gar_infiltrateredistribute). Only _t_o_advancefront contains a restriction that prevents overshooting. I suppose _t_o_advancefront tends to slightly underestimate infiltration (due to neglecting percolation during the remaining subperiod where the discussed bin is filled), and _gar_infiltrateredistribute tends to (vastly) overestimate infiltration (due to applying capillary forcing during the whole numerical substep). So, I prefer the approach of the C implementation. However, later refactorings which make such hidden dependencies more explicit would be helpful.

Two questions are still open:

A new one: What if the first bin's moisture value exceeds the second bin's moisture value? Is this even possible? (Only if the wetting front overshoots the soil depth without saturation.) In case it is possible, is it already handled somewhere?
The old one: What is the proper lower boundary condition for coupling with HydPy-L?

tyralla commented 2 years ago

I get the following discontinuous infiltration rates when I let the water evaporate from the soil only (both in the C and the Python implementation):

grafik

If I allow taking water from the soil's surface first (if available), the rates look much better:

grafik

So, I go for the latter approach without investigating the reason for the discontinuous rates in the first case.

In both cases, evaporation takes place after infiltration and thus takes the remaining surface water only.

tyralla commented 2 years ago

Sometimes, I encounter "spikes" like this one (both in the C and the Python implementation):

grafik

For now, I just document this behaviour and leave it like that. In case it becomes relevant later (I cannot certainly exclude such spikes may result in fragile states sometimes).

tyralla commented 2 years ago

An example for what might happen if one does not prepare enough bins:

3 bins: grafik

4 bins:

grafik

The most convenient approach seems to automatically extend the number of available bins when necessary during simulation. So far, reallocating arrays is something we prevent entirely during simulations for computational efficiency. It seems a good idea for the given problem, as reallocations should rarely happen. But it might require significant implementation efforts (Python vs C, multiple sequences, related time series objects).

tyralla commented 2 years ago

One thing I do not fully understand is how gar_limit_infiltration_capacity works. Until being taught better, I prefer keeping a simplified version in HydPy. Also, I have the feeling that gar_create_right_bin should call gar_limit_infiltration_capacity with bin+1 (the new active bin) instead of bin (the currently last active bin).

tyralla commented 2 years ago

Method Add_ActiveBin_V1 is (in my opinion) a simpler to understand version of gar_create_right_bin. However, I still find some of its behaviours confusing. Nevertheless, as its results agree with gar_create_right_bin and it does not seem to cause relevant trouble, I leave it like this for now and thoroughly discuss its peculiarities in its documentation.

tyralla commented 2 years ago

The variables' names should be self-explanatory. My suggestions:

I replaced GroundwaterRecharge with Percolation (a shorter and more neutral term).
I added TotalInfiltration

tyralla commented 2 years ago

Overshooting the soil's bottom results in percolation (or groundwater recharge), which is more due to numerical inaccuracy than GARTO's concept. In some cases, the related errors seem to be negligible. However, during front redistribution, I found very significant overestimations of percolation when considering evaporation. I hope I fixed it well, but I added the following remark in case not:

    ToDo: In the original GARTO code, the remaining soil water and infiltration
          become percolation.  However, when evaporation interferes with the
          original equations, this simplification can result in percolation rates
          larger than saturated conductivity.  We observed this in the
          :ref:`ga_garto_24h_1000mm_evap_continuous` example, where percolation was
          20 mm/h (the rainfall rate) in the 21st hour despite a saturated
          conductivity of 13.2 mm/h.  For our assumption, the percolation rate is
          12.8 mm/h.  Nevertheless, we should keep this deviation from the original
          implementation in mind for a while;  in case it brings unexpected side
          effects in future applications.

Generally, I left more To-Do remarks than usual in a (potential) master branch model. But I expect we need to fine-tune the model later anyhow (when coupling it to other models).

tyralla commented 2 years ago

I added a method named Water_AllBins_V1 (analogue to Withdraw_AllBins_V1) for adding soil water supply directly to the soil's body. Withdraw_AllBins_V1 primarily serves to extract water due to evaporation. So it takes water from the surface front of the wettest bin. On the contrary, Water_AllBins_V1 primarily serves to add water due to capillary rise. So it adds moisture to the first bin, which's water always reaches the soil's bottom (where presumably the groundwater table lies).

We require this method for coupling GARTO to LARSIM's capillary rise option. Nevertheless, I think adding it to the stand-alone GARTO model is beneficial - at least for testing, which is more straightforward for the stand-alone model than the submodel.

Conceptually, everything seems clear. But technically, I am not 100 % certain everything will always run smoothly. As for Withdraw_AllBins_V1, Water_AllBins_V1 leaves a confusing intermediate state that needs to be tidied up by the following methods. For example, there is no updating of the moisture changes. The first test runs were successful, but we still need to keep an eye on it.

tyralla commented 2 years ago

A reminder: in HydPy-L, soil evaporation can be negative. Hence, the coupling of HydPy-L to the GARTO submodel now deals with such cases. However, the GARTO submodel always assumes positive soil evaporation inputs. Maybe, we should change this later?

hydpy-dev / hydpy

Implementing GARTO #89