traitecoevo / mortality_bci

Tropical tree mortality model using Barro Colorado Island as a case study
Other
7 stars 3 forks source link

remove redundancy from workflow #101

Closed jscamac closed 6 years ago

jscamac commented 8 years ago

In the old workflow I had included code that automatically copied models that were were previously run in an earlier model selection procedure so that they can be used in others. However, I seem to have forgotten to add this requirement back in under the new workflow. Its not a huge burden (we are running in total 60 more chains then we need) but here are the redundancies:

1) We are modelling the gamma model twice (once for each growth measure). What we really should be doing breaking the tasks_2_run function into model section specific functions. For the functional/growth comparison this function would involve two expand_grids.. One for models that actually use growth and the other for base hazard model.

Something like this I guess:

function_growth_tasks <- function(iter=4000,path='.') {
  n_kfolds = 10
  n_chains = 3
    comparison <- "function_growth_comparison"
    growth_measure <- c("true_dbh_dt",'true_basal_area_dt')
    rho_combo <- "none"
    model  <- c("growth_hazard","base_growth_hazard")

  ret1 <- expand.grid(comparison=comparison,
                     model = model,
                     iter=iter,
                     chain=seq_len(n_chains),
                     growth_measure=growth_measure,
                     rho_combo=rho_combo,
                     kfold=seq_len(n_kfolds),
                     stringsAsFactors=FALSE) %>%
        arrange(model, growth_measure)

  ret2 <- expand.grid(comparison=comparison,
                     model = "base_hazard",
                     iter=iter,
                     chain=seq_len(n_chains),
                     growth_measure="none",
                     rho_combo=rho_combo,
                     kfold=seq_len(n_kfolds),
                     stringsAsFactors=FALSE)
  ret <- rbind(ret1,ret2)

  ret$modelid <- rep(1:nrow(unique(ret[,c('comparison','model','growth_measure','rho_combo','kfold')])),each = n_chains)
  ret <- ret %>%
    mutate(jobid = seq_len(n()),
           filename = sprintf("%s/results/chain_fits/%s/%d.rds", path, comparison, jobid),
           fold_data = sprintf("%s/precompile/kfold_data/bci_data_%s.rds", path, kfold))
  return(ret)
}

Separating the model comparison tasks into seperate functions might also solve our next problem.

2) When comparing rho combinations we've technically ran the model without any rho effects in the functional/growth comparison. Ideally we'd want to copy these files into the rho combination directory if they exist.

Possibly along these lines (note possibly not in fully functional form below):

rho_combo_tasks <- function(iter=4000,path='.') {
  n_kfolds = 10
  n_chains = 3
  comparison <- "rho_comparisons"
  growth_measure <- c("true_dbh_dt")
  rho_combo <- expand.grid(a=c('','a'), b=c('','b'), c=c('','c'), stringsAsFactors = FALSE)
  rho_combo <- sapply(split(rho_combo, seq_len(nrow(rho_combo))), function(x) paste0(x, collapse=''))
  rho_combo[rho_combo==''] <- "none"
  model  <- c("base_growth_hazard")

  ret <- expand.grid(comparison= comparison,
                     model = model,
                     iter=iter,
                     chain=seq_len(n_chains),
                     growth_measure=growth_measure,
                     rho_combo=rho_combo,
                     kfold=seq_len(n_kfolds),
                     stringsAsFactors=FALSE)

  ret$modelid <- rep(1:nrow(unique(ret[,c('comparison','model','growth_measure','rho_combo','kfold')])),each = n_chains)
  ret <- ret %>%
    mutate(jobid = seq_len(n()),
           filename = sprintf("%s/results/chain_fits/%s/%d.rds", path, comparison, jobid),
           fold_data = sprintf("%s/precompile/kfold_data/bci_data_%s.rds", path, kfold))

    tasks <- function_growth_tasks()
    i <- match(do.call(paste, ret[, c('model','chain', 'growth_measure', 'rho_combo', 'kfold')]),
               do.call(paste, tasks[, c('model', 'chain', 'growth_measure', 'rho_combo', 'kfold')]))
    ff <- file.copy(tasks$filename[na.omit(i)], ret$filename[which(!is.na(i))])
    if(!all(ff)) {
      warning(sprintf('Some previously run funct/growth comparison models outputs failed to copy:\n%s',
                      paste(tasks$filename[na.omit(i)][!ff], collapse='\n')))
    }
    return(ret[is.na(i), ])
}

For now I've decided not to mess with the workflow until the latest models are finished. But this definitely needs to be fixed. @dfalster You know of any neater approaches to tackling this issue?