traitecoevo / plant

Trait-Driven Models of Ecology and Evolution :evergreen_tree:
https://traitecoevo.github.io/plant
53 stars 20 forks source link

Tidy patch outputs #308

Closed dfalster closed 1 year ago

dfalster commented 2 years ago

The outputs of run_scm_collect are not particularly user friendly.

suggest tidying into tidy data format, organising by cohort and species.

Here is a prototype


tidy_species <- function(data) {

  # get dimensions of data = number of steps * number of cohorts
  dimensions <- dim(data[1,,] )

  # establish data structure for results    
  data_species <- 
    tidyr::expand_grid(
      step = seq_len(dimensions[1]), 
      cohort = seq_len(dimensions[2])
    )

  # retrieve bnames of all tracked variables
  vars <- data[,1,1] %>% names()

  # bind each onto main data frame
  for(v in vars) {
    data_species[[v]] <- 
      results$species[[1]][v, , ] %>% 
      as.data.frame %>% tidyr::as_tibble() %>%
      tidyr::pivot_longer(cols=starts_with("V"), names_to = "cohort") %>% 
      pull(value)
  }

  data_species %>% mutate(density = exp(log_density)) %>% select(-log_density)
}

tidy_env <- function(env) {
  tibble(step = seq_len(length(env))) %>%
    left_join(by = "step", 
    env %>% purrr::map_df(as_tibble, .id= "step") %>% mutate(step = as.integer(step))
    )
}

tidy_patch <- function(results) {

  out <- results

  data <- tibble(
    step = seq_len(length(results$time)),
    time = results$time, 
    patch_density = results$patch_density
    )

  out[["species"]] <- 
    left_join(by = "step", data,
      purrr::map_df(results$species, tidy_species, .id="species")
    )

  out[["env"]] <- 
    left_join(by = "step", data,
      tidy_env(results$env)
    )

  out[["n_spp"]] <- length(results$species)

  out[["patch_density"]] <- NULL

  out
}

run_scm_collect() %>% tidy_patch()

Could even become the default for scm, or an option in scm

dfalster commented 2 years ago

Tidy outputs should enable easier calculation of garages values for output

In general, we want to calculate 3 types of output

  1. each cohort at each heigh at each time (default raw output)
  2. total or average of some variable for each species at each patch age --> obtained by integrating over n at given patch age
  3. some variable at common size at each patch age --> obtaining by interpolating along n at given patch age
  4. Averages of any of the above over all patch ages (i.e. #89 )

This should be easier with tidy outputs. E.g. for no 2 above


patch_species_total <- function(data) {
  data  %>%
    select(-cohort) %>% na.omit() %>% 
    filter(step > 1) %>% 
    group_by(step, time, patch_density, species) %>% 
    summarise(
      individuals = -plant:::trapezium(height, density),
      across(c(starts_with("area"), starts_with("mass")), ~ -plant:::trapezium(height, density*.x)),#, .names = "{.col}_tot"),
      .groups="drop"
    )
}

Then

run_scm_collect() %>% tidy_patch() %>% patch_species_total()