rsetienne / DAISIE

DAISIE
GNU General Public License v3.0
9 stars 12 forks source link

Review code #90

Open richelbilderbeek opened 4 years ago

richelbilderbeek commented 4 years ago

To quote @joshwlambert:

If you are happy with this as an ongoing plan please let us know. The code to be reviewed for our work is: DAISIE_sim_constant_rate DAISIE_sim_constant_rate_shift DAISIE_sim_time_dependent DAISIE_sim_core_constant_rate DAISIE_sim_core_constant_rate_shift DAISIE_sim_core_time_dependent DAISIE_rates DAISIE_max_rates DAISIE_format_CS_full_stt DAISIE_sample_event_constant_rate DAISIE_sample_event_time_dependent DAISIE_update_state_constant_rate DAISIE_update_state_time_dependent

richelbilderbeek commented 4 years ago

To complete: at the geomdynamics branch.

richelbilderbeek commented 4 years ago

I'll have a look :eyes:

richelbilderbeek commented 4 years ago

Hi DAISIE team,

I've reviewed the first function only, as most of this feedback will apply to other functions as well. My feedback is extensive, but feel free to ignore some points. I enjoy seeing the DAISIE code improve: well done DAISIE team!

I will review the other functions if the feedback given here is applied to those other functions as well. Just give me the thumbs up!

Keep up the good work!

Cheers, Richel

#' @title Simulate islands with given parameters.
#' @description This function simulates islands with given cladogenesis,
#' extinction, Kprime, immigration and anagenesis parameters. If a single
#' parameter set is provided (5 parameters) it simulates islands where all
#' species have the same macro-evolutionary process. If two paramater sets
#' (10 parameters) are provided, it simulates islands where two different
#' macro-evolutionary processes operate, one applying to type 1 species and
#' other to type 2 species. If two parameter sets and a time shift (11
#' parameters) are provided, it simulates islands where at the given time
#' a shift between the parameter sets will occur.

I enjoy the extensive amount of documentation. It is recommended however, to do the following:

  #' [thing that will the title]
  #' [an empty line]
  #' [thing that will the description]
  #' [thing that will the description]
  #' [thing that will the description]
#' Returns R list object that contains the simulated islands

Superior option: use @return. Which is done below. So, this line can be removed :+1:

#' @inheritParams default_params_doc

Brilliant, could not have done any better :grin:

#' @return Each simulated dataset is an element of the list, which can be
#' called using [[x]]. For example if the object is called island_replicates,
#' the last replicates is a list in itself. The first (e.g.
#' island_replicates[[x]][[1]]) element of that list has the following
#' components: \cr \code{$island_age} - the island age \cr Then, depending on
#' whether a distinction between types is made, we have:\cr \code{$not_present}
#' - the number of mainland lineages that are not present on the island \cr
#' or:\cr \code{$not_present_type1} - the number of mainland lineages of type 1
#' that are not present on the island \cr \code{$not_present_type2} - the
#' number of mainland lineages of type 2 that are not present on the island \cr
#' \code{$stt_all} - STT table for all species on the island (nI - number of
#' non-endemic species; nA - number of anagenetic species, nC - number of
#' cladogenetic species, present - number of independent colonisations present
#' )\cr \code{$stt_stt_type1} - STT table for type 1 species on the island -
#' only if 2 types of species were simulated (nI - number of non-endemic
#' species; nA - number of anagenetic species, nC - number of cladogenetic
#' species, present - number of independent colonisations present )\cr
#' \code{$stt_stt_type2} - STT table for type 2 species on the island - only if
#' 2 types of species were simulated (nI - number of non-endemic species; nA -
#' number of anagenetic species, nC - number of cladogenetic species, present -
#' number of independent colonisations present )\cr \code{$brts_table} - Only
#' for simulations under 'IW'. Table containing information on order of events
#' in the data, for use in maximum likelihood optimization.)\cr
#'
#' The subsequent elements of the list each contain information on a single
#' colonist lineage on the island and has 4 components:\cr
#' \code{$branching_times} - island age and stem age of the population/species
#' in the case of Non-endemic, Non-endemic_MaxAge and Endemic anagenetic
#' species. For cladogenetic species these should be island age and branching
#' times of the radiation including the stem age of the radiation.\cr
#' \code{$stac} - the status of the colonist \cr * Non_endemic_MaxAge: 1 \cr *
#' ndemic: 2 \cr * Endemic&Non_Endemic: 3 \cr * Non_endemic: 4 \cr
#' \code{$missing_species} - number of island species that were not sampled for
#' particular clade (only applicable for endemic clades) \cr \code{$type_1or2}
#' - whether the colonist belongs to type 1 or type 2 \cr

This is a terrible return type to read. The return type is a list, so I'd prefer the elements of the list to be itemized (using itemize).

#' @author Luis Valente and Albert Phillimore

Taking a look at the blame/praise I see that Pedro and Josh worked on this. I suggest to add these to the list of names.

#' @seealso \code{\link{DAISIE_format_CS}} \code{\link{DAISIE_plot_sims}}
#' @references Valente, L.M., A.B. Phillimore and R.S. Etienne (2015).
#' Equilibrium and non-equilibrium dynamics simultaneously operate in the
#' Galapagos islands. Ecology Letters 18: 844-852.
#' Hauffe, T., D. Delicado, R.S. Etienne and L. Valente (submitted).
#' Lake expansion increases equilibrium diversity via the target effect of
#' island biogeography.
#' @keywords models

:+1:

#' @examples
#'
#' cat("
#' ## Simulate 40 islands for 4 million years, where all species have equal
#' ## rates, and plot the species-through-time plot. Pool size 1000.
#'
#' pars_equal = c(2.550687345,2.683454548,Inf,0.00933207,1.010073119)
#' island_replicates_equal = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = pars_equal,
#'    replicates = 40
#'    )
#'
#' ## Simulate 15 islands for 4 million years with two types of species (type1
#' ## and type 2), and plot the species-through-time plot. Pool size 1000.
#' ## Fraction of type 2 species in source pool is 0.163. Function will
#' ## simulate until number of islands where type 2 species has colonised is
#' ## equal to number specified in replicates.
#'
#' pars_type1 = c(0.195442017,0.087959583,Inf,0.002247364,0.873605049)
#' pars_type2 = c(3755.202241,8.909285094,14.99999923,0.002247364,0.873605049)
#' island_replicates_2types = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = c(pars_type1,pars_type2),
#'    replicates = 15,
#'    prop_type2_pool = 0.163
#'    )
#' ## Simulate a non-oceanic island for 4 million years, and plot the
#' ## species-through-time plot. Pool size 1000. Island area as a proportion
#' ## of mainland is 0.1, proportion of native species is 0.9.
#'  pars = c(2.550687345,2.683454548,Inf,0.00933207,1.010073119)
#'  island_replicates = DAISIE_sim_constant_rate(
#'    time = 4,
#'    M = 1000,
#'    pars = pars,
#'    replicates = 40,
#'    nonoceanic_pars = c(0.1, 0.9)
#'  )
#' ## Simulate 15 islands for 4 million years with a shift in immigration rate
#' ## at 0.195 Ma, and plot the species-through-time plot. Pool size 296.
#'
#' pars_before_shift = c(0.079, 0.973, Inf, 0.136, 0.413)
#' pars_after_shift = c(0.079, 0.973, Inf, 0.652, 0.413)
#' tshift = 0.195
#' island_shift_replicates = DAISIE_sim_constant_rate_shift(
#'    time = 4,
#'    M = 296,
#'    pars = c(pars_before_shift, pars_after_shift),
#'    replicates = 15,
#'    shift_times = tshift
#'  )
#' ")

These examples can definitely be improved.

Example code should be runnable, simple and clean. Instead, it is complex and does not even run! As a user, I want example code that is guaranteed to be correct, that is, checked by R CMD check. Please make the code do something simple.

Also, make the example simple. I am unsure what those numbers in pars_equal are. Add extra variables like immigration_rate <- 3.14 and put those in pars_equal.

The style is non-Tidyverse (an equal sign for assignment is so 1970s). This gives a bad impression.

Final point: these simulations in the examples are, as far as I remember, published simulations. Just make functions for those, like run_sim_mee_2012 (and user superior naming). Refer to those function here, in e.g. @seealso. Within this example, do something simple instead.

#'
#' @export
DAISIE_sim_constant_rate <- function(
  time,
  M,

I wish I knew what M was before reading the doc. Actually, I learned that M is -I guess- the number of mainland species, thanks to the more descriptive interface of DAISIE_sim_core_constant_rate: kudo's to its author(s). The name n_mainland_species would be more appropriate.

  pars,

pars is kind of general. I think these are the DAISIE simulation parameters (unlike the DAISIE ML estimation parameters). I'd prefer a more descriptive name like sim_pars instead.

  replicates,

I predict this is the number of replicates, instead of some other DAISIE sim output. I suggest to call this n_replicates instead: it's clearer and n_something is a common idiom.

  divdepmodel = "CS",

OK enough.

  nonoceanic_pars = c(0, 0),

I see why this value is chosen: I predict there are two nonoceanic_pars, that have no effect for values of zero. I would prefer an NA here, because it does look needlessly complex

  num_guilds = NULL,

Would an num_guilds of 1 be equivalent and thus a more natural default value? If not, NULL is fine.

  prop_type2_pool = NA,
  replicates_apply_type2 = TRUE,
  sample_freq = 25,

No idea why a sample_freq of 25 should be the default. Wouldn't 1 be more natural?

  plot_sims = TRUE,

This argument should be removed: it is needlessly 'convenient', and complicates the code needlessly instead. In the doc, use DAISIE_plot_sims in the example code and redirect the user using @seealso to DAISIE_plot_sims

  hyper_pars = NULL,
  area_pars = NULL,
  dist_pars = NULL,

OK

  verbose = TRUE,

Verbose hould be false by default: Duck for 'Unix Golden Rule of Silence'. Also, you may have already noticed, you need to silence the output of such a function every test. Just silence it by default.

  ...

The use of ... is almost always evil. As is in this case: it is not even used. Please remove that troublesome operator.

) {
  testit::assert(
    "length(pars) is not five and/or shift_times is not null, set
    five parameters with no shift_times or ten parameters with
    non-null shift_times",
    length(pars) == 5 || (length(pars) == 10 && !is.na(prop_type2_pool))
  )
  testit::assert(
    "2 type islands cannot have species on the island initially",
    is.na(prop_type2_pool) || !is.na(prop_type2_pool) && nonoceanic_pars[1] == 0
  )
  testit::assert(
    "prop_type2_pool should either be NA for no type 2 species or value between
    0 and 1",
    is.na(prop_type2_pool) || (prop_type2_pool >= 0 && prop_type2_pool <= 1)
  )

I enjoy this error checking!

But first, this error checking is already quite complex. Use a function for each check instead: this will be easier to debug, easier to read, easier to write.

Second, it is getting clear that the function is actually doing two things: check for 5 or 10 parameters. I suggest that this function quickly redirects to functions specialized on 5 or 10 parameters. Of course, there should be little code duplication, so things done for both 5 and 10 parameters should be put into little functions as well.

I do see the function is mostly a redirection function already, depending on the diversity dependence model. You'll see that below I will request that this function does even more so. Do add example code for each possible flow through this function, which are ?four models (CS 5 pars, CS 10 pars, IW and GW).

Not all input parameters are checked. For example, for the diversity dependence model: there is no check if a valid model is used (e.g. testthat::expect_true(divdepmodel %in% c("IW", "CW", "GW"))). If the XXX (yes, XXX indeed) diversity dependence model is used, there is no helpful error message. Add some functions like assert_div_dep_model/check_div_dep_model to check a divdepmodel for validity or use get_all_div_dep_models to get all valid models.

I see it is hard to check all input parameters: the function has tons of arguments. This can be reduced by using more structured data, but that is quite some work.

  totaltime <- time
  island_replicates <- list()
  if (divdepmodel == "IW") {
    for (rep in 1:replicates) {
      island_replicates[[rep]] <- DAISIE_sim_core_constant_rate(
        time = totaltime,
        mainland_n = M,
        pars = pars,
        nonoceanic_pars = nonoceanic_pars,
        hyper_pars = hyper_pars,
        area_pars = area_pars,
        dist_pars = dist_pars
      )
      if (verbose == TRUE) {
        print(paste("Island replicate ", rep, sep = ""))

Use if (verbose) ... instead of checking for TRUE explicitly. Do not use print, use message instead, it's the superior Tidyverse way. Use paste0 instead of paste(..., sep = ""): it's shorter.

      }
    }
    island_replicates <- DAISIE_format_IW(island_replicates = island_replicates,
                                          time = totaltime,
                                          M = M,
                                          sample_freq = sample_freq,
                                          verbose = verbose)

Would the 'convenient' plotting be removed, one can write return(DAISIE_format_IW(...)) here.

  }
  if (divdepmodel == "CS") {
    if (length(pars) == 5) {
      for (rep in 1:replicates) {
        island_replicates[[rep]] <- list()
        full_list <- list()
        for (m_spec in 1:M) {
          full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
            time = totaltime,
            mainland_n = 1,
            pars = pars,
            nonoceanic_pars = nonoceanic_pars,
            hyper_pars = hyper_pars,
            area_pars = area_pars,
            dist_pars = dist_pars
          )
        }
        island_replicates[[rep]] <- full_list
        if (verbose == TRUE) {
          print(paste("Island replicate ", rep, sep = ""))
        }
      }
    } else if (length(pars) == 10) {
      if (replicates_apply_type2 == TRUE) {
        island_replicates <- DAISIE_sim_min_type2(
          time = totaltime,
          M = M,
          pars = pars,
          replicates = replicates,
          prop_type2_pool = prop_type2_pool,
          verbose = verbose)
      } else {
        for (rep in 1:replicates) {
          pool2 <- DDD::roundn(M * prop_type2_pool)
          pool1 <- M - pool2
          lac_1 <- pars[1]
          mu_1 <- pars[2]
          K_1 <- pars[3]
          gam_1 <- pars[4]
          laa_1 <- pars[5]
          lac_2 <- pars[6]
          mu_2 <- pars[7]
          K_2 <- pars[8]
          gam_2 <- pars[9]
          laa_2 <- pars[10]
          full_list <- list()
          #### species of pool1
          for (m_spec in 1:pool1) {
            full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
              time = totaltime,
              mainland_n = 1,
              pars = c(lac_1,
                       mu_1,
                       K_1,
                       gam_1,
                       laa_1)
            )
            full_list[[m_spec]]$type1or2  <- 1
          }
          #### species of pool2
          for (m_spec in (pool1 + 1):(pool1 + pool2)) {
            full_list[[m_spec]] <- DAISIE_sim_core_constant_rate(
              time = totaltime,
              mainland_n = 1,
              pars = c(lac_2,
                       mu_2,
                       K_2,
                       gam_2,
                       laa_2)
            )
            full_list[[m_spec]]$type1or2 <- 2
          }
          island_replicates[[rep]] <- full_list
          if (verbose == TRUE) {
            print(paste("Island replicate ", rep, sep = ""))
          }
        }

This part is definitly not a function read, with lac_1 <- pars[1], mu_1 <- pars[2] and so on. Put this in a seperate function instead.

      }
    }
    island_replicates <- DAISIE_format_CS(
      island_replicates = island_replicates,
      time = totaltime,
      M = M,
      sample_freq = sample_freq,
      verbose = verbose
    )

Would the 'convenient' plotting be removed, one can write return(DAISIE_format_CS(...)) here.

  }

  if (divdepmodel == "GW") {
    if (!is.numeric(num_guilds)) {
      stop("num_guilds must be numeric")
    }
    guild_size <- M / num_guilds
    testit::assert(num_guilds < M)
    testit::assert(M %% num_guilds == 0)
    for (rep in 1:replicates) {
      island_replicates[[rep]] <- list()
      full_list <- list()
      for (m_spec in 1:num_guilds) {
        full_list[[m_spec]]  <- DAISIE_sim_core_constant_rate(
          time = totaltime,
          mainland_n = guild_size,
          pars = pars,
          nonoceanic_pars = nonoceanic_pars,
          hyper_pars = hyper_pars,
          area_pars = area_pars,
          dist_pars = dist_pars
        )
      }
      island_replicates[[rep]] <- full_list
      if (verbose == TRUE) {
        print(paste("Island replicate ", rep, sep = ""))
      }
    }
    island_replicates <- DAISIE_format_GW(island_replicates = island_replicates,
                                          time = totaltime,
                                          M = M,
                                          sample_freq = sample_freq,
                                          num_guilds = num_guilds,
                                          verbose = verbose)

Would the 'convenient' plotting be removed, one can write return(DAISIE_format_GW(...)) here.

  }
  if (plot_sims == TRUE) {
    DAISIE_plot_sims(
      island_replicates = island_replicates,
      sample_freq = sample_freq
    )
  }

Remove this 'convenient' plotting. It has nothing to do with simulating.

  return(island_replicates)
}
richelbilderbeek commented 4 years ago

I will unassign me now, up until I am given the green light for another round of feedback :innocent:

Neves-P commented 4 years ago

@richelbilderbeek, thank you so much for such great comments on the code! I will go over each topic with the rest of the DAISIE team to discuss/implement changes, we'll probably break this down into individual issues to go step by. Thanks again, we'll keep everyone updated! 👍

richelbilderbeek commented 4 years ago

Thanks! I am happy to hear that the feedback is appreciated! Good luck :+1: