Closed aornugent closed 2 years ago
I have added a script to benchmark the test script in the OP under R/benchmark.R
. The implementation on this branch gave the following results:
> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
soil_layers
1 1
2 10
3 50
4 100
# A tibble: 4 × 14
expression soil_layers min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm 1 716.67ms 718.28ms 1.39 17.8MB 2.09 4 6 2.87s <NULL> <Rprofmem [890 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm 10 754.07ms 762.25ms 1.31 17.1MB 1.97 4 6 3.05s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm 50 945.79ms 948.84ms 1.05 17.1MB 1.58 4 6 3.8s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm 100 1.19s 1.21s 0.830 17.1MB 1.25 4 6 4.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
whilst the implementation on #328 yields:
> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
soil_layers
1 1
2 10
3 50
4 100
# A tibble: 4 × 14
expression soil_layers min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm 1 1.55s 1.55s 0.639 17.1MB 0.959 4 6 6.26s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm 10 1.98s 1.98s 0.504 17.1MB 2.02 2 8 3.97s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm 50 3.18s 3.2s 0.312 17.1MB 0.468 4 6 12.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm 100 4.49s 4.56s 0.220 17.1MB 0.514 3 7 13.62s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
Clearly the implementation on this branch (#329) is faster.
The profile generated by the script in scripts/test-profiler.R
:
Thanks Mitch! Can you please add layers = 0
for reference? I went to compare against the Github Actions benchmark for develop
but it turns out your machine is much quicker ;)
Great work here @devmitch @aornugent. This new implementation is 2-3 times faster than the other, nice!
The output of the profiler is fantastic, well done getting this to work. That will be very helpful in a number of places.
Thanks Mitch! Can you please add
layers = 0
for reference? I went to compare against the Github Actions benchmark fordevelop
but it turns out your machine is much quicker ;)
The environment construction function in R doesn't allow this:
https://github.com/traitecoevo/plant/blob/930b9c9679859727b8b2ac36995f360a9dc0838b/R/ff16w.R#L77-L78
How about 1 layer then. Close to 0
While looking into a segfault bug in the test-cohort.R
tests, I noticed that while consumption_rates
were being correctly resized for patches with resize_consumption_rates()
, this was not being done when creating a single individual, unlike the other vectors which were being resized correctly in the Individual constructor:
Individual(strategy_type_ptr s) : strategy(s) {
if (strategy->aux_index.size() != s->aux_size()) {
strategy->refresh_indices();
}
vars.resize(strategy_type::state_size(), s->aux_size()); // = Internals(strategy_type::state_size());
set_state("height", strategy->height_0);
vars.resize_consumption_rates(vars.state_size); // NEW - now correctly sets the size of consumption_rates
}
The tests now pass for test-cohort.R
, but got me wondering - what is the point of the resource_size
member variable in Internals
if it is always going to be the same as state_size
, another member variable in the same class?
For example, in Patch
, we resize the consumption rates using environment.ode_size()
:
https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/patch.h#L133
which is simply vars.state_size()
:
https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/environment.h#L30
Thus, resize_consumption_rates
only ever sets resource_size
to state_size
:
https://github.com/traitecoevo/plant/blob/abdea4cc4aafc09c87d1d988a381d0430b6102e9/inst/include/plant/internals.h#L58-L60
The only time I can see them being different is when a resize()
is done after resize_consumption_rates
- maybe I'm getting confused, but I would like some clarity on this.
@aornugent I'll let you address this query
Hey @devmitch - thanks for the investigation. There's a few things to unpack here.
vars
objects. For FF16w: individual.vars.state_size() = 5
(+2 if it's a cohort) and environment.vars.ode_size() = n_soil_layers
. resize_consumption_rates
is handled separately because individuals do not have access to the environment at construction. So when creating a patch, the individuals are instantiated using a strategy object, and then consumption rates are resized from the patch constructor which does have access to information about the environment.It's a shame that the tests fail without initalising an individuals consumption rates. This is a shot in the dark, but do they pass if the default consumption rates size is set to zero? Keep the questions coming, it's a bit curly.
This is a shot in the dark, but do they pass if the default consumption rates size is set to zero? Keep the questions coming, it's a bit curly.
They do not - this is what it was previously, since the size was never explicitly set with the Internals constructor, and the Individuals constructor (previously) did nothing, it was 0 by default.
Now that you have cleared up my confusion with the separate Internals
objects, I think I can more clearly explain the problem in the call stack, I might still be off though. The R test in test-cohort.R
that is failing is roughly:
# ...
# FF16w, FF16
plant <- Individual(x, e)(s)
# ...
env <- test_environment(x, 2 * plant$state("height"),
light_env=function(x) rep(1, length(x)))
# ...
plant.compute_rates(env)
Upon construction of the Individual
object, the consumption_rates
vector inside the Individual's vars
has size 0 by default because as you said, there is no information about the environment (ie about the number of soil layers) and there is no Patch
to set the size using the environment.
After the test environment is created, we call plant.compute_rates(env)
, which passes the environment and the Individual's vars
object to FF16w_Strategy::compute_rates
:
https://github.com/traitecoevo/plant/blob/04b9b502ffc0e210c44af7d9d52a5f6f68278ea0/inst/include/plant/individual.h#L69-L72
Note that the Individual's vars
object has still not had its consumption_rates
vector resized, because again it hasn't been touched by a Patch
. In other words it still has size 0.
Finally, in FF16w_Strategy::compute_rates
we attempt to access n_soil_layer
objects (because we now have access to the environment ie environment.ode_size()
) in the vars
object which was passed from the Individual
, which as mentioned still has 0 objects. So a segfault occurs as we attempt to access the first object:
https://github.com/traitecoevo/plant/blob/04b9b502ffc0e210c44af7d9d52a5f6f68278ea0/src/ff16w_strategy.cpp#L32-L34
My previous "solution" of resizing the consumption_rates
vector in the Individual
constructor only happened to work in this case as individual.vars.state_size() > environment.vars.ode_size() = n_soil_layers
, in other words there were luckily more objects in the vector than what was being looped over in FF16w_Strategy::compute_rates
.
The only correct solution I can think of at the moment would be to resize the consumption_rates
vector the first time we get access to the environment in this scenario - in other words, in Individual::compute_rates()
:
void compute_rates(const environment_type& environment,
bool reuse_intervals = false) {
vars.resize_consumption_rates(environment.ode_size());
strategy->compute_rates(environment, reuse_intervals, vars);
}
but I'm pretty sure this sucks because working with a Patch
we would be calling to resize the vector multiple times. @aornugent
Ah ha! That makes complete sense. How about we make it conditional and add a check in Individual::compute_rates()
to resize consumption rates if given an environment with a different sized Internals
object?
The size of the environment Internals
object shouldn't change within a Patch and means that the modular Individual would adapt to its environment..
compute_rates
gets called a lot, but it's usually a large or complex function, so hopefully repeatedly checking the size of the two internals objects has minimal overhead. A more speculative solution might be to move the consumption rates into the environment's Internals
object, but I'm not quit sure how that would work so we'd probably only consider if if there were a significant performance hit with your proposed approach.
Looks like there is basically no overhead:
NEW:
Running resource consumption benchmarks`
Running with:
soil_layers
1 1
2 10
3 50
4 100
# A tibble: 4 × 14
expression soil_layers min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm 1 706.15ms 706.84ms 1.41 17.8MB 1.98 5 7 3.54s <NULL> <Rprofmem [881 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm 10 749.14ms 751.2ms 1.33 17.1MB 1.33 5 5 3.75s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm 50 937.15ms 939.2ms 1.07 17.1MB 1.60 4 6 3.75s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm 100 1.15s 1.15s 0.865 17.1MB 1.30 4 6 4.63s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
OLD:
> run_resource_consumption_benchmarks()
Running resource consumption benchmarks`
Running with:
soil_layers
1 1
2 10
3 50
4 100
# A tibble: 4 × 14
expression soil_layers min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc total_time result memory time gc
<bch:expr> <dbl> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl> <bch:tm> <list> <list> <list> <list>
1 scm 1 706.06ms 707.52ms 1.41 17.1MB 5.65 2 8 1.42s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
2 scm 10 750.62ms 751.96ms 1.33 17.1MB 3.09 3 7 2.26s <NULL> <Rprofmem [670 × 3]> <bench_tm [10]> <tibble [10 × 3]>
3 scm 50 930.8ms 935.32ms 1.05 17.1MB 1.57 4 6 3.82s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
4 scm 100 1.17s 1.17s 0.854 17.1MB 3.41 2 8 2.34s <NULL> <Rprofmem [671 × 3]> <bench_tm [10]> <tibble [10 × 3]>
Interesting, this passess on Ubuntu but:
Windows
== Failed tests ================================================================
-- Failure (test-strategy-ff16w.R:97:3): Rainfall spline basic run -------------
out$offspring_production not equal to 16.88961056463.
1/1 mismatches
[1] 16.9 - 16.9 == -0.00164
macOS
══ Failed tests ════════════════════════════════════════════════════════════════
── Failure (test-strategy-ff16w.R:97:3): Rainfall spline basic run ─────────────
out$offspring_production not equal to 16.88961056463.
1/1 mismatches
[1] 16.9 - 16.9 == -0.00142
I wonder if there's small differences in the interpolation of the extrinsic_drivers splines?
It may be the different compilers (or compiler flags) relating to double precision with the splines.
@dfalster - we're a bit stuck on what to do next. I don't know how to conclusively test whether the numerical instability across operating systems comes from spine interpolation or another source. Do we have a 'phone-a-friend' option we can try to resolve this?
Alternatively, what do you think the required precision for offspring production should be? I'm not super-thrilled at relaxing tolerances if we don't have to, but agreement to 1/1000th of an individual might be enough.
:exclamation: No coverage uploaded for pull request base (
develop@425004d
). Click here to learn what that means. The diff coverage isn/a
.
@@ Coverage Diff @@
## develop #329 +/- ##
==========================================
Coverage ? 79.45%
==========================================
Files ? 96
Lines ? 8785
Branches ? 0
==========================================
Hits ? 6980
Misses ? 1805
Partials ? 0
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 425004d...3d6cd9e. Read the comment docs.
The issue of reproducibility across operating systems was resolved by generating a stationary rainfall timeseries:
- integrand <- function(x) {x^2}
+ integrand <- function(x) {1000 + sin(x)}
We suspect that the previous test scenario generated extreme values that raised small numerical differences in total patch offspring_production
.
Looks good!
This is a different implementation approach to #328; but hopefully achieving the same goal. @devmitch - could you please benchmark them to see which is more performant? Please feel free to make any optimisations/code-fixes you see fit.
A simple test script, with a handful of cohorts and soil layers
I've left a bunch of print-statements in
ff16_environment.compute_rates()
which produce the following output, but they should be removed.Infil. rate for soil layer 0 comes from the rainfall spline (reaching 10.00 at time 10.00), and a crude drainage multiplier of 10% is applied to approximate water entering sub-layers (e.g. soil layer 1 receiving 10% of the soil layer 0 balance; rounded to 2 decimal places and truncated at zero).
Extraction rate is integrated over cohorts for each species in each soil layer. Currently this is set to
2 * area_leaf
and is therefore constant across soil layers. Net flux is infil. rate less extraction and drainage rates, representing change in soil water levels which are integrated in the environment state variables shown as water balance.