traitecoevo / plant

Trait-Driven Models of Ecology and Evolution :evergreen_tree:
https://traitecoevo.github.io/plant
53 stars 20 forks source link

New mutant method #362

Closed aornugent closed 11 months ago

aornugent commented 1 year ago

This adds the capability to store environments at each step of the plant ODE solver, with the intention of estimating mutant fitness more efficiently by re-using the 'resource shadow' of that is calculated when solving the patch dynamics of residents.

I've added an 'environment_cache' to Patches and inserted caching calls into the solver that store the environment state at each sub-step of the Runga-Kutta ODE solver. I noticed in Rich's code the enable_if template that allows us to add methods to generic classes (like the ODE solver) based on the input type. When solving other objects (i.e. non-patches) this cache method does nothing and the solver continues as normal.

If the save_history flag is set in the control object, the SCM will store the Patch environment cache at each step of the system. The result is a vector of vectors, each containing six (6) environment objects. Example code below.

The next step is to add a MutantRunner class that re-uses the already solved environment. Exact design TBC.

# basic setup
p0 <- scm_base_parameters("FF16", "FF16_Env")

traits <- trait_matrix(c(0.07, 0.85), c("lma"))

p1 <- expand_parameters(traits, p0, mutant = FALSE, birth_rate_list = c(1, 1))

env <- make_environment("FF16")
ctrl <- scm_base_control()

ctrl$save_history = TRUE  # flag to enable cache

# create an SCM
types <- extract_RcppR6_template_types(p1, "Parameters")
scm <- do.call('SCM', types)(p1, env, ctrl)

# solve system
scm$run()

# env. history cached as n_patch_times x 6 RK steps
str(scm$environment_history, max.level = 1, list.len = 3) 
> List of 141
 $ :List of 6
 $ :List of 6
 $ :List of 6

class(scm$environment_history[[1]][[1]])
> [1] "FF16_Environment" "R6"        

# demonstrate difference in timing
scm$patch_step_history[1]
> 0

scm$environment_history[[1]][[1]]$time
> 6e-06
aornugent commented 1 year ago

https://github.com/traitecoevo/plant/pull/362/commits/7d18779d0bd094e9051820d06fbcc524d85f0a1a adds the first half of the mutant runner.

SCM objects now have an additional method called run_mutant that iterates over the environment history, loading the vector of environments into the patch-level cache at each timestep.

In ode_interface.h we overload derivs with a specific template has_cache that allows the ODE solver to pass in the RK step index when solving a Patch*. If the patch.use_cached_environment flag is set, the system is passed to a distinct version of set_ode_state which loads the appropriate environment from the cache.

The system runs, but I've not yet verified the results are correct. The next step is to develop the means to alter the list of strategies to pass in a mutant (i.e. a species with zero density). An example of the intended workflow is

# enable RK cache
ctrl <- scm_base_control()
ctrl$save_RK45_cache = T

# create SCM
types <- extract_RcppR6_template_types(p1, "Parameters")
scm <- do.call('SCM', types)(p1, e, ctrl)

# run the residents
scm$run()

# start the system schedule again
scm$reset()

## TODO:
scm$patch$strategies <- FF16_Strategy(mutant = T)

# solve using cached environment reflecting the residents' 'resource shadow' 
scm$run_mutant()

*Note: overloading derivs and maintaining version doesn't take an RK index is mostly only to support the functions in Ode_R.

aornugent commented 1 year ago

Ok, this now runs the one species case and converts the resident to a mutant.

# enable RK cache
ctrl <- scm_base_control()
ctrl$save_RK45_cache = T

# create SCM
types <- extract_RcppR6_template_types(p1, "Parameters")
scm <- do.call('SCM', types)(p1, e, ctrl)

# run the residents
scm$run()

# solve using cached environment reflecting the residents' resource shadow
scm$run_mutant()

Unfortunately we get a difference in net reproduction ratios (residents: 47.7 vs. mutant: 4.1).

I notice in both cases that sometimes the Runga-Kutta solver gets run more than once per node_schedule event. This appears to happen more frequently when solving the mutant.

dfalster commented 1 year ago

SUPER exciting that it runs, well done!

Next step is get the right numbers. I have ideas on how to pursue this. More later.

aornugent commented 1 year ago

Picking up this thread again. Plots of exp(height) and density for residents (black lines) and the final state of mutants (red dots) show small differences in patch state, where we'd expect them to be identical.

resident <- collect(scm)
scm$net_reproduction_ratios

mutant <- scm$run_mutant()
scm$net_reproduction_ratios

# uh-oh
# [1] 47.66136
# [1] 4.049069

# tidy species
# ..

ggplot(resident, aes(time, exp(log_density), group = node)) +
  geom_line() +
  geom_point(data = mutant, color = "red")

resident(resident, aes(time, exp(height), group = node)) +
  geom_line() +
  geom_point(data = mutant, color = "red")

image image

NB: only showing endpoint because I don't have a collect function for mutants yet.

aornugent commented 1 year ago

Interestingly, the performance of the mutant seems to respond to the density of the resident.

birth_rates <- c(100, 50, 10, 5, 1, 0.1, 0.01, 0.001)
grid_search <- purrr::map(birth_rates, run_one_patch)

fitnesses <- purrr::map_df(grid_search,
                           ~ purrr::pluck(., "fitnesses") %>% 
                             tibble::tibble(., rownames = c("resident_rr", "mutant_rr"),
                                            .name_repair = ~ c("net_reproduction_ratio", "invasion_type")),
                           .id = "grid_step")

tidyr::spread(fitnesses, invasion_type, net_reproduction_ratio) %>%
  dplyr::mutate(birth_rate = birth_rates,
                unit = "seeds_per_m2_yr") %>%
  dplyr::group_by(grid_step, birth_rate, unit) %>%
  dplyr::mutate(relative_mutant_fitness = scales::percent(mutant_rr / resident_rr))

#  grid_step   mutant_rr resident_rr birth_rate unit            relative_mutant_fitness 
# 1 1             0.00131       0.379    100     seeds_per_m2_yr 0%   
# 2 2             0.00202       0.740     50     seeds_per_m2_yr 0%   
# 3 3             0.479         4.07      10     seeds_per_m2_yr 12%  
# 4 4             0.132         8.45       5     seeds_per_m2_yr 2%   
# 5 5             4.05         47.7        1     seeds_per_m2_yr 8%   
# 6 6           145.          584.         0.1   seeds_per_m2_yr 25%  
# 7 7          4901.         6041.         0.01  seeds_per_m2_yr 81%  
# 8 8         26291.        28712.         0.001 seeds_per_m2_yr 92%  

# should probably use patchwork for plotting grids
purrr::pluck(grid_search, 8) %>%
  {
    ggplot(.$resident_state, aes(time, exp(log_density) * 10, group = node)) +
    geom_line() +
    geom_point(data = .$mutant_endpoint, color = "red")

  ggplot(.$resident_state, aes(time, exp(height), group = node)) +
    geom_line() +
    geom_point(data = .$mutant_endpoint, color = "red")
  }

The difference between 1 $seeds.m^{-2}.yr{-1}$ (shown in previous comment) and 0.001 $seeds.m^{-2}.yr{-1}$ is most obvious in mutant exp(height) which is now much closer to the largest, oldest residents. I'm not sure how I feel about the surge in resident density from nodes that were previously near extinct.

image image

aornugent commented 1 year ago

I've added two print statement for comparison between resident and mutant runs.

1. in ode_interface.h -> derivs

derivs are called from ode_step.h and dispatch to one of two set_ode_state signatures based on the use_cached_environment flag.

https://github.com/traitecoevo/plant/blob/032b0d4625e3305239c803a4ee57a97f69055986/inst/include/plant/ode_interface.h#L154-L166

2. in patch.h -> set_ode_state

Patches have two signatures for the set_ode_state method>

The first one takes in the step time and records that timestep in the environment

https://github.com/traitecoevo/plant/blob/032b0d4625e3305239c803a4ee57a97f69055986/inst/include/plant/patch.h#L308-L324

the other takes in the index and loads the environment from the cache (remembering that only objects that have the used_cached_environment flag are routed to this version of the method).

https://github.com/traitecoevo/plant/blob/032b0d4625e3305239c803a4ee57a97f69055986/inst/include/plant/patch.h#L326-L340

Results

I copied the console output for residents and mutants into logfiles. The console length threshold was 1,000 lines. The last thousand lines of the mutant console output covers fewer timesteps than the resident console output, suggesting that there's extra steps being taken somewhere.

Inspecting output of the final step, we see that the resident's step time and environment time are lagged, but otherwise match:

resident: step time 104.264; resident: etime 104
resident: step time 104.396; resident: etime 104.264
resident: step time 104.792; resident: etime 104.396
resident: step time 105.32; resident: etime 104.792
resident: step time 105.155; resident: etime 105.32
resident: step time 105.32; resident: etime 105.155

But the mutant's step times and environment times differ.

mutant: step time 105.015; mutant: etime 104; index 0
mutant: step time 105.053; mutant: etime 104.264; index 1
mutant: step time 105.168; mutant: etime 104.396; index 2
mutant: step time 105.32; mutant: etime 104.792; index 3
mutant: step time 105.272; mutant: etime 105.32; index 4
mutant: step time 105.32; mutant: etime 105.155; index 5

Also note that the mutant environment times match the resident environment times

aornugent commented 1 year ago

I think this now works! After chasing my tail for a few weeks, I discovered that most of what I needed was already implemented (which often seems the case).

This diagram roughly summarises the changes made to enable saving resident ODE state during each RK45 step (green) and the new features added to enable a mutant run that follows the same ODE history (blue).

image

dfalster commented 1 year ago

Great progress @aornugent ! Diagram is very helpful.

Testing on a wider parameter set, I sometimes get an error about step history not being sorted, I think because sometimes the ODE solver goes backwards when it takes a step that's too big? This suggests I might need to tweak exactly where caching occurs.

regarding step history being sorted, I'd guess the same as you, occasionally the solver decides to shorten the step and reruns. Suggested steps from here

  1. confirm this is indeed the cause
  2. consider how to fix.

You could possibly change where the casing occurs, OR alternatively,

a. when you rerun a step, clear the last cached items (ie 5-6 from a single RK step) so only the successful step is kept?

b. Similar to a, but add all the items for a single RK step (so 5-6 env calls and times) into. temporary bucket while the step is in progress, then only add this to the permanent cache when a step is successful?

aornugent commented 1 year ago

That was an easy fix - I was indeed caching before ode_solver->step() had checked for success.

This is now the output of scripts/mutant-method.r

  grid_step mutant_rr resident_rr birth_rate unit            rr_prop 
  <chr>         <dbl>       <dbl>      <dbl> <chr>           <chr>
1             0.315       0.316    100     seeds_per_m2_yr 100% 
2             0.629       0.629     50     seeds_per_m2_yr 100% 
3             3.48        3.48      10     seeds_per_m2_yr 100% 
4             7.26        7.26       5     seeds_per_m2_yr 100% 
5            41.4        41.4        1     seeds_per_m2_yr 100% 
6           516.        516.         0.1   seeds_per_m2_yr 100% 
7          5353.       5353.         0.01  seeds_per_m2_yr 100% 
8         24649.      24649.         0.001 seeds_per_m2_yr 100% 
dfalster commented 1 year ago

Hoo boy, that's great!

Next step?

aornugent commented 1 year ago

Latest commit is a bit rough and ready. The script mutant-timing.r aims to evaluate whether running multiple mutants at once is faster than iterating through a pre-loaded competitive environment with one mutant at a time.

First, note the time taken to create a resident. Caching the environment at each ODE step appears to have only a minor overhead:

# residents no-cache (sec) cache-env (sec)
1 1.21 1.23

Now compare the effect of re-using the cached environment. This commit provides two approaches to node scheduling: mutants can either re-use the same schedule as the resident or introduce a node at every ODE step (hereafter: a mutant schedule). Re-using the resident schedule is not yet enabled for the many-mutants case so we can only compare these two approaches for 1x mutant.

There are generally more ODE steps than node introductions, meaning that the mutant schedule is larger than the resident schedule (e.g. a resident schedule with 140 nodes takes about 300 ODE steps). This difference is shown in the runtimes:

# mutants resident-schedule (sec) mutant-schedule (sec)
1 0.41 0.69

The increased resolution of the mutant schedule also leads to minor differences in fitness when running identical strategies:

# mutants resident-schedule (reproduction ratio) mutant-schedule (reproduction ratio)
1 1.045965 1.063594

Using the mutant schedule, initial testing shows that timing does not change between iterating 1 mutant at a time or running many mutants at once:

# mutants one-at-a-time (sec) many-at-once (sec)
1 0.67 0.66
10 6.85 6.65
100 66.90 69.33

note: times are one iteration only so small differences should be ignored

Note that the changes to is_resident are experimental and may yet be reverted. The basic expectation of identical mutants and residents attaining equivalent fitness is met for the 1x mutant case (see above) but breaks when running multiple identical mutants - only the first mutant attains comparable fitness, and subsequent mutants have fitness that change between runs (even if they have identical traits).

All this is to say that the many-mutant case should only be used to evaluate timing and inform design - further work is required if we decide that running many mutants at once is advantageous or necessary.

dfalster commented 1 year ago

Hi Andrew, Great work here. I think we're close. I've reviewed the code and have the following thoughts and insights

  1. I think I know why the run many mutants simultaneously is behaving weirdly (only first one returns reasonable values), which is that we never set the birth_rates for the extra mutants! See comments above. It's reasonable to always sets birth rates for mutants to 1. Just need to make an appropriate change in patch birth rates when you add mutants.

  2. I was quite surprised how slow the run_mutant method is. But I suspect we're wasting a lot of compute time copying around environment objects. If we making the main environment object in patch a pointer, we'll save all this, but it will require minor changes everywhere environment is called. We could attack this in a different PR if the whole interface is working ok.

  3. Does caching or having the option to cache in the code slow the resident run much? Can you compare speed running resident on the develop branch, to this branch?

dfalster commented 1 year ago

Hi Andrew,

New idea on issues running multiple mutants! See my commit above, this adds more examples.

it suggests that many mutants

I also sometimes get completely crashes due to memory allocation failures.

All of this suggests that it's an issue with memory allocation. When we run the sim for the first time, various memory is allocated. When we add mutants, we're not managing to adjust the size of relevant containers correctly. Having running a larger number of residents means the memory is already allocated at the beginning, so many mutant works. Make sense?

aornugent commented 11 months ago

Oh boy - I fixed one issue. In net_reproduction_ratio_by_node_weighted we look up species' dispersal survival probability using the SCM's Parameters object.

The solution was to re-write run_mutant to accept a parameters object instead and overwrite it in the SCM. This is destructive, meaning that we lose the ability to switch back to the resident, but it works!

I've added tests based on your last changes to mutant_timing.r.

dfalster commented 11 months ago

Well done, this looks workable. Ready to merge

dfalster commented 11 months ago

All tests now passing locally. Let's check GH Actions. Assuming they pass, we're ready to start introducing pointers.

dfalster commented 11 months ago

I've had some smallish success working his that locally (not yet pushed), but seems promising

codecov-commenter commented 11 months ago

Codecov Report

:exclamation: No coverage uploaded for pull request base (develop@a8fc78c). Click here to learn what that means. Patch has no changes to coverable lines.

:exclamation: Current head a2e0fac differs from pull request most recent head 0bbb2bc. Consider uploading reports for the commit 0bbb2bc to get more accurate results

:exclamation: Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## develop #362 +/- ## ========================================== Coverage ? 82.81% ========================================== Files ? 99 Lines ? 9036 Branches ? 0 ========================================== Hits ? 7483 Misses ? 1553 Partials ? 0 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

dfalster commented 11 months ago

Results of speed tests comparing three implementations of rescinders and mutant runs with

  1. original code off develop, where mutants are run simultenously with existing residents
  2. the new mutant method implemented in this PR, where envy is cached and reused, so we can run mutants on their own
  3. as above, but using pointers to save copying the env objects from the history
  4. New method but without caching

For each method, I ran in combinations with

image

What we can see from these runs is that

Running residents

Running mutants

(the benefit of pointers is not as big as I'd expected. In retrospect it seems about right. I thought it might be bigger because our previous speed comparison showed mutant and resindet being not that different. But I was misled, as we only ran a single resident for 1 year. Running multiple residents for 100 years makes the difference between mutant and resident more dramatic. But also, even for a short run, the new method will be 50% reduction of old, as don't need to run resident alongside. )

Here's code run on various branches

Code to run on dev branch (old mutant method)

library(plant)
devtools::load_all()

p <- scm_base_parameters("FF16")
p$max_patch_lifetime <- 100

e <- make_environment("FF16")
ctrl <- scm_base_control()

lma <- c(0.05, 0.1, 0.2)
birth_rate <- 1

# 1 resident strategies

p1 <- expand_parameters(trait_matrix(lma[2], "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 1)
)

p1m1 <- expand_parameters(trait_matrix(lma[2], "lma"), p1,
  mutant = T,
  birth_rate_list = rep(birth_rate, 1)
)

p1m3 <- expand_parameters(trait_matrix(lma, "lma"), p1,
  mutant = T,
  birth_rate_list = rep(birth_rate, 3)
)

p1m21 <- expand_parameters(trait_matrix(rep(lma, 7), "lma"), p1,
  mutant = T,
  birth_rate_list = rep(birth_rate, 21)
)

types <- extract_RcppR6_template_types(p1, "Parameters")
scm <- do.call("SCM", types)(p1, e, ctrl)
cat("Residents 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p1m1, e, ctrl)
cat("mutants 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p1m3, e, ctrl)
cat("mutants 3: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p1m21, e, ctrl)
cat("mutants 21: ", system.time(scm$run()), "\n")

# 3 resident strategies

p3 <- expand_parameters(trait_matrix(lma, "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 3)
)

p3m1 <- expand_parameters(trait_matrix(lma[2], "lma"), p3,
  mutant = T,
  birth_rate_list = rep(birth_rate, 1)
)

p3m3 <- expand_parameters(trait_matrix(lma, "lma"), p3,
  mutant = T,
  birth_rate_list = rep(birth_rate, 3)
)

p3m21 <- expand_parameters(trait_matrix(rep(lma, 7), "lma"), p3,
  mutant = T,
  birth_rate_list = rep(birth_rate, 21)
)

scm <- do.call("SCM", types)(p3, e, ctrl)
cat("Residents 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p3m1, e, ctrl)
cat("mutants 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p3m3, e, ctrl)
cat("mutants 3: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p3m21, e, ctrl)
cat("mutants 21: ", system.time(scm$run()), "\n")

# 21 resident strategies
p21 <- expand_parameters(trait_matrix(rep(lma, 7), "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 21)
)

p21m1 <- expand_parameters(trait_matrix(lma[2], "lma"), p21,
  mutant = T,
  birth_rate_list = rep(birth_rate, 1)
)

p21m3 <- expand_parameters(trait_matrix(lma, "lma"), p21,
  mutant = T,
  birth_rate_list = rep(birth_rate, 3)
)

p21m21 <- expand_parameters(trait_matrix(rep(lma, 7), "lma"), p21,
  mutant = T,
  birth_rate_list = rep(birth_rate, 21)
)

scm <- do.call("SCM", types)(p21, e, ctrl)
cat("Residents 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p21m1, e, ctrl)
cat("mutants 1: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p21m3, e, ctrl)
cat("mutants 3: ", system.time(scm$run()), "\n")
scm <- do.call("SCM", types)(p21m21, e, ctrl)
cat("mutants 21: ", system.time(scm$run()), "\n")

Code to run on new mutant methods (with and without pointer implementation)

p <- scm_base_parameters("FF16")
p$max_patch_lifetime <- 100

e <- make_environment("FF16")
ctrl <- scm_base_control()
ctrl$save_RK45_cache <- TRUE

lma <- c(0.05, 0.1, 0.2)
birth_rate <- 1

p1 <- expand_parameters(trait_matrix(lma[2], "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 1)
)

p3 <- expand_parameters(trait_matrix(lma, "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 3)
)

p21 <- expand_parameters(trait_matrix(rep(lma, 7), "lma"), p,
  mutant = F,
  birth_rate_list = rep(birth_rate, 21)
)

types <- extract_RcppR6_template_types(p1, "Parameters")
scm <- do.call("SCM", types)(p1, e, ctrl)
cat("Residents 1: ", system.time(scm$run()), "\n")
resident_rr1 <- scm$net_reproduction_ratios

cat("mutants 1: ", system.time(scm$run_mutant(p1)), "\n")
cat("mutants 3: ", system.time(scm$run_mutant(p3)), "\n")
cat("mutants 21: ", system.time(scm$run_mutant(p21)), "\n")

types <- extract_RcppR6_template_types(p3, "Parameters")
scm <- do.call("SCM", types)(p3, e, ctrl)
cat("Residents 3: ", system.time(scm$run()), "\n")
resident_rr3 <- scm$net_reproduction_ratios

cat("mutants 1: ", system.time(scm$run_mutant(p1)), "\n")
cat("mutants 3: ", system.time(scm$run_mutant(p3)), "\n")
cat("mutants 21: ", system.time(scm$run_mutant(p21)), "\n")

types <- extract_RcppR6_template_types(p21, "Parameters")
scm <- do.call("SCM", types)(p21, e, ctrl)
cat("Residents 21: ", system.time(scm$run()), "\n")
resident_rr21 <- scm$net_reproduction_ratios

# one mutants - same environment
cat("mutants 1: ", system.time(scm$run_mutant(p1)), "\n")
cat("mutants 3: ", system.time(scm$run_mutant(p3)), "\n")
cat("mutants 21: ", system.time(scm$run_mutant(p21)), "\n")

With become much more obvious when the residents

dfalster commented 11 months ago

I'm yet to push the implementation with pointers. Just want to fine tune comments and code.

dfalster commented 11 months ago

Jobs to do before merging

  1. remove mutant =, is_resident from R side [Daniel started this]
  2. clean up comments and remove superfluous [Daniel started, Andrew continue after Daniel pushes]
  3. Solve case of running mutant with no residents (viable fitness). Possibly by running. resident with no seed rain
  4. updated vignettes for mutant fitness
  5. More testing of variable fitness / mutant runs on develop, so that we can check numeric outputs [Daniel started]

Daniel to push existing work on this branch then focus on 5, Andrew complete 1-4

dfalster commented 11 months ago

Amazing work here!