aornugent commented 2 years ago

This is a different implementation approach to #328; but hopefully achieving the same goal. @devmitch - could you please benchmark them to see which is more performant? Please feel free to make any optimisations/code-fixes you see fit.

A simple test script, with a handful of cohorts and soil layers

devtools::load_all()

p0 <- scm_base_parameters("FF16w")
p0$max_patch_lifetime = 10

p1 <- expand_parameters(trait_matrix(0.0825, "lma"), p0, FF16w_hyperpar,FALSE)
p1$cohort_schedule_times <- list(seq(0, 10, len = 3))

env <- make_environment("FF16w", 
                        soil_number_of_depths = 3,
                        soil_initial_state = rep(1, 3))

ctrl <- scm_base_control()
env$set_extrinsic_driver("rainfall", 0:10, 0:10)

out <- run_scm(p1, env, ctrl)

I've left a bunch of print-statements in ff16_environment.compute_rates() which produce the following output, but they should be removed.

...
Time: 9.77
Soil layer: 0; infil. rate: 9.77; extraction rate: 4.32; net flux: 3.32; water balance 21.26
Soil layer: 1; infil. rate: 2.13; extraction rate: 4.32; net flux: -2.19; water balance -8.21
Soil layer: 2; infil. rate: 0.00; extraction rate: 4.32; net flux: -4.32; water balance -16.59

Time: 9.93
Soil layer: 0; infil. rate: 9.93; extraction rate: 4.35; net flux: 3.40; water balance 21.80
Soil layer: 1; infil. rate: 2.18; extraction rate: 4.35; net flux: -2.17; water balance -8.55
Soil layer: 2; infil. rate: 0.00; extraction rate: 4.35; net flux: -4.35; water balance -17.27

Time: 10.00
Soil layer: 0; infil. rate: 10.00; extraction rate: 4.36; net flux: 3.44; water balance 22.04
Soil layer: 1; infil. rate: 2.20; extraction rate: 4.36; net flux: -2.15; water balance -8.71
Soil layer: 2; infil. rate: 0.00; extraction rate: 4.36; net flux: -4.36; water balance -17.59

Infil. rate for soil layer 0 comes from the rainfall spline (reaching 10.00 at time 10.00), and a crude drainage multiplier of 10% is applied to approximate water entering sub-layers (e.g. soil layer 1 receiving 10% of the soil layer 0 balance; rounded to 2 decimal places and truncated at zero).

Extraction rate is integrated over cohorts for each species in each soil layer. Currently this is set to 2 * area_leaf and is therefore constant across soil layers. Net flux is infil. rate less extraction and drainage rates, representing change in soil water levels which are integrated in the environment state variables shown as water balance.

devmitch commented 2 years ago

I have added a script to benchmark the test script in the OP under R/benchmark.R. The implementation on this branch gave the following results:

> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
  soil_layers
1           1
2          10
3          50
4         100
# A tibble: 4 × 14
  expression soil_layers      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory               time            gc               
  <bch:expr>       <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>               <list>          <list>           
1 scm                  1 716.67ms 718.28ms     1.39     17.8MB     2.09     4     6      2.87s <NULL> <Rprofmem [890 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm                 10 754.07ms 762.25ms     1.31     17.1MB     1.97     4     6      3.05s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm                 50 945.79ms 948.84ms     1.05     17.1MB     1.58     4     6       3.8s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm                100    1.19s    1.21s     0.830    17.1MB     1.25     4     6      4.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>

whilst the implementation on #328 yields:

> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
  soil_layers
1           1
2          10
3          50
4         100
# A tibble: 4 × 14
  expression soil_layers      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory               time            gc               
  <bch:expr>       <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>               <list>          <list>           
1 scm                  1    1.55s    1.55s     0.639    17.1MB    0.959     4     6      6.26s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm                 10    1.98s    1.98s     0.504    17.1MB    2.02      2     8      3.97s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm                 50    3.18s     3.2s     0.312    17.1MB    0.468     4     6     12.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm                100    4.49s    4.56s     0.220    17.1MB    0.514     3     7     13.62s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>

Clearly the implementation on this branch (#329) is faster.

The profile generated by the script in scripts/test-profiler.R:

minimal

aornugent commented 2 years ago

Thanks Mitch! Can you please add layers = 0 for reference? I went to compare against the Github Actions benchmark for develop but it turns out your machine is much quicker ;)

dfalster commented 2 years ago

Great work here @devmitch @aornugent. This new implementation is 2-3 times faster than the other, nice!

The output of the profiler is fantastic, well done getting this to work. That will be very helpful in a number of places.

devmitch commented 2 years ago

Thanks Mitch! Can you please add layers = 0 for reference? I went to compare against the Github Actions benchmark for develop but it turns out your machine is much quicker ;)

The environment construction function in R doesn't allow this:

https://github.com/traitecoevo/plant/blob/930b9c9679859727b8b2ac36995f360a9dc0838b/R/ff16w.R#L77-L78

dfalster commented 2 years ago

How about 1 layer then. Close to 0

devmitch commented 2 years ago

While looking into a segfault bug in the test-cohort.R tests, I noticed that while consumption_rates were being correctly resized for patches with resize_consumption_rates(), this was not being done when creating a single individual, unlike the other vectors which were being resized correctly in the Individual constructor:

Individual(strategy_type_ptr s) : strategy(s) {
    if (strategy->aux_index.size() != s->aux_size()) {
      strategy->refresh_indices();
    }
    vars.resize(strategy_type::state_size(), s->aux_size()); // = Internals(strategy_type::state_size());
    set_state("height", strategy->height_0);

    vars.resize_consumption_rates(vars.state_size); // NEW - now correctly sets the size of consumption_rates
  }

The tests now pass for test-cohort.R, but got me wondering - what is the point of the resource_size member variable in Internals if it is always going to be the same as state_size, another member variable in the same class?

For example, in Patch, we resize the consumption rates using environment.ode_size(): https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/patch.h#L133 which is simply vars.state_size(): https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/environment.h#L30

Thus, resize_consumption_rates only ever sets resource_size to state_size: https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/internals.h#L58-L60

The only time I can see them being different is when a resize() is done after resize_consumption_rates - maybe I'm getting confused, but I would like some clarity on this.

dfalster commented 2 years ago

@aornugent I'll let you address this query

aornugent commented 2 years ago

Hey @devmitch - thanks for the investigation. There's a few things to unpack here.

Individuals and environments have separate vars objects. For FF16w: individual.vars.state_size() = 5 (+2 if it's a cohort) and environment.vars.ode_size() = n_soil_layers.
resize_consumption_rates is handled separately because individuals do not have access to the environment at construction. So when creating a patch, the individuals are instantiated using a strategy object, and then consumption rates are resized from the patch constructor which does have access to information about the environment.
Ideally, we'd keep this separation and allow an individual to be created without an environment and without any consumption rate variables when working outside the Patch framework. This maintains the modularity of the code desribed in the tests. I tend to think of a Patch as the union of a strategy (embodied by individuals or cohorts) and an environment.

It's a shame that the tests fail without initalising an individuals consumption rates. This is a shot in the dark, but do they pass if the default consumption rates size is set to zero? Keep the questions coming, it's a bit curly.

devmitch commented 2 years ago

This is a shot in the dark, but do they pass if the default consumption rates size is set to zero? Keep the questions coming, it's a bit curly.

They do not - this is what it was previously, since the size was never explicitly set with the Internals constructor, and the Individuals constructor (previously) did nothing, it was 0 by default.

Now that you have cleared up my confusion with the separate Internals objects, I think I can more clearly explain the problem in the call stack, I might still be off though. The R test in test-cohort.R that is failing is roughly:

# ...
# FF16w, FF16
plant <- Individual(x, e)(s)
# ...
env <- test_environment(x, 2 * plant$state("height"),
                            light_env=function(x) rep(1, length(x)))
# ...
plant.compute_rates(env)

Upon construction of the Individual object, the consumption_rates vector inside the Individual's vars has size 0 by default because as you said, there is no information about the environment (ie about the number of soil layers) and there is no Patch to set the size using the environment.

After the test environment is created, we call plant.compute_rates(env), which passes the environment and the Individual's vars object to FF16w_Strategy::compute_rates: https://github.com/traitecoevo/plant/blob/04b9b502ffc0e210c44af7d9d52a5f6f68278ea0/inst/include/plant/individual.h#L69-L72 Note that the Individual's vars object has still not had its consumption_rates vector resized, because again it hasn't been touched by a Patch. In other words it still has size 0.

Finally, in FF16w_Strategy::compute_rates we attempt to access n_soil_layer objects (because we now have access to the environment ie environment.ode_size()) in the vars object which was passed from the Individual, which as mentioned still has 0 objects. So a segfault occurs as we attempt to access the first object: https://github.com/traitecoevo/plant/blob/04b9b502ffc0e210c44af7d9d52a5f6f68278ea0/src/ff16w_strategy.cpp#L32-L34

My previous "solution" of resizing the consumption_rates vector in the Individual constructor only happened to work in this case as individual.vars.state_size() > environment.vars.ode_size() = n_soil_layers, in other words there were luckily more objects in the vector than what was being looped over in FF16w_Strategy::compute_rates.

The only correct solution I can think of at the moment would be to resize the consumption_rates vector the first time we get access to the environment in this scenario - in other words, in Individual::compute_rates():

void compute_rates(const environment_type& environment,
                       bool reuse_intervals = false) {
  vars.resize_consumption_rates(environment.ode_size());
  strategy->compute_rates(environment, reuse_intervals, vars);
}

but I'm pretty sure this sucks because working with a Patch we would be calling to resize the vector multiple times. @aornugent

aornugent commented 2 years ago

Ah ha! That makes complete sense. How about we make it conditional and add a check in Individual::compute_rates() to resize consumption rates if given an environment with a different sized Internals object?

The size of the environment Internals object shouldn't change within a Patch and means that the modular Individual would adapt to its environment..

compute_rates gets called a lot, but it's usually a large or complex function, so hopefully repeatedly checking the size of the two internals objects has minimal overhead. A more speculative solution might be to move the consumption rates into the environment's Internals object, but I'm not quit sure how that would work so we'd probably only consider if if there were a significant performance hit with your proposed approach.

devmitch commented 2 years ago

Looks like there is basically no overhead:

NEW:

Running resource consumption benchmarks`
Running with:
  soil_layers
1           1
2          10
3          50
4         100
# A tibble: 4 × 14
  expression soil_layers      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory               time            gc               
  <bch:expr>       <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>               <list>          <list>           
1 scm                  1 706.15ms 706.84ms     1.41     17.8MB     1.98     5     7      3.54s <NULL> <Rprofmem [881 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm                 10 749.14ms  751.2ms     1.33     17.1MB     1.33     5     5      3.75s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm                 50 937.15ms  939.2ms     1.07     17.1MB     1.60     4     6      3.75s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm                100    1.15s    1.15s     0.865    17.1MB     1.30     4     6      4.63s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>

OLD:

> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
  soil_layers
1           1
2          10
3          50
4         100
# A tibble: 4 × 14
  expression soil_layers      min   median `itr/sec` mem_alloc `gc/sec` n_itr  n_gc total_time result memory               time            gc               
  <bch:expr>       <dbl> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl> <int> <dbl>   <bch:tm> <list> <list>               <list>          <list>           
1 scm                  1 706.06ms 707.52ms     1.41     17.1MB     5.65     2     8      1.42s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm                 10 750.62ms 751.96ms     1.33     17.1MB     3.09     3     7      2.26s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm                 50  930.8ms 935.32ms     1.05     17.1MB     1.57     4     6      3.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm                100    1.17s    1.17s     0.854    17.1MB     3.41     2     8      2.34s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>

aornugent commented 2 years ago

Interesting, this passess on Ubuntu but:

Windows

== Failed tests ================================================================
-- Failure (test-strategy-ff16w.R:97:3): Rainfall spline basic run -------------
out$offspring_production not equal to 16.88961056463.
1/1 mismatches
[1] 16.9 - 16.9 == -0.00164

macOS

══ Failed tests ════════════════════════════════════════════════════════════════
── Failure (test-strategy-ff16w.R:97:3): Rainfall spline basic run ─────────────
out$offspring_production not equal to 16.88961056463.
1/1 mismatches
[1] 16.9 - 16.9 == -0.00142

I wonder if there's small differences in the interpolation of the extrinsic_drivers splines?

devmitch commented 2 years ago

It may be the different compilers (or compiler flags) relating to double precision with the splines.

aornugent commented 2 years ago

@dfalster - we're a bit stuck on what to do next. I don't know how to conclusively test whether the numerical instability across operating systems comes from spine interpolation or another source. Do we have a 'phone-a-friend' option we can try to resolve this?

Alternatively, what do you think the required precision for offspring production should be? I'm not super-thrilled at relaxing tolerances if we don't have to, but agreement to 1/1000th of an individual might be enough.

codecov-commenter commented 2 years ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (develop@425004d). Click here to learn what that means. The diff coverage is n/a.

@@            Coverage Diff             @@
##             develop     #329   +/-   ##
==========================================
  Coverage           ?   79.45%           
==========================================
  Files              ?       96           
  Lines              ?     8785           
  Branches           ?        0           
==========================================
  Hits               ?     6980           
  Misses             ?     1805           
  Partials           ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 425004d...3d6cd9e. Read the comment docs.

aornugent commented 2 years ago

The issue of reproducibility across operating systems was resolved by generating a stationary rainfall timeseries:

- integrand <- function(x) {x^2}
+ integrand <- function(x) {1000 + sin(x)}

We suspect that the previous test scenario generated extreme values that raised small numerical differences in total patch offspring_production.

dfalster commented 2 years ago

Looks good!

traitecoevo / plant

Implement consumption_rates for individuals #329

Codecov Report