To what extent can phylogenetic or functional relationships among species be leveraged to inform estimates of nonlinear changes in abundance?
This study aims to use long-term multi-species monitoring data to tackle the above question.
Contributors (in no particular order)
Nicholas Clark
Adam Smith
Shubhi Sharma
Casey Youngflesh
Caleb Robbins
Hammed Akande
Guillermo Fandos
Thomas Johnson
Proposed methodology
- Gather multi-species abundance (or relative abundance) measures from long-term monitoring studies
- Construct phylogenetic and functional trees to represent relationships among species
- Gather other appropriate information necessary to capture spatial confounding (i.e. coordinates, polygon structures etc...). The script
BBS_trends_data.R
in this repo has some annotated code to walk through such a data gathering / cleaning scheme
- Build Generalized Additive Models (GAMs) in
mgcv
using tensor product decompositions (see the help page on tensor products for more information) that can be used to ask how species' relationships inform estimates of nonlinear trend. Make use of the highly flexible mrf
basis in mgcv
to incorporate phylogenetic and functional information (see this post from Cross Validated Gavin Simpson and this blogpost from myself to get a bit more context on how these models work. The script BBS_trends_models.R
in this repo has some example annotated code to show how these can be fit in bam()
while also attempting to account for unmodelled temporal autocorrelation
- Design a model evaluation scheme that allows us to compare fits from phylogenetic, functional and "null" models (that use only the random effect grouping factors of "species", but not their relationships) in a variety of ways (cross-validation by leaving certain species or groups out, with appropriate proper scoring rules; calculating trait contributions to squared second derivatives of trends; comparisons against models that assume trends are linear)
Tasks
Design and justification
- [ ] Review literature to understand approaches that have been used to leverage phylogenetic or functional relationships to inform population estimates
- [ ] Gather information on the types of models / analyses that are commonly used for large multispecies datasets to inform decisions or calculations of indices (for example, how do US and Canadian Governments use NA BBS data? And could the proposed models make any impact on these pipelines?)
- [ ] Also gather information on Gaussian Markov Random Fields and their potential applications in complex nonlinear effect estimates (see for example this work by Rue and Held and this post)
Methodology
- [ ] Identify appropriate multi-species datasets. There is considerable information (with example code) provided by this preprint and the accompanying Github repo
- [ ] For candidate datasets, determine appropriate steps for cleaning and preparing data. We don't want too many shortcuts here (i.e. blindly aggregating with no justification for this), it would be better to think through the data generating process for each dataset
- [ ] Determine appropriate cross-validation schemes, considering blocking over space, time and phylogeny / functional dendrogram, to evaluate candidate models
- [ ] Prepare scoring scripts and justify scoring rules to prioritize; consider CRPS, energy and variogram scores see this lecture on univariate forecast evaluation and this lecture on multivariate forecast evaluation for context
- [ ] Brainstorm the kinds of outputs that we will need, and make sure we have well-annotated functions that can be applied to any of the models for calculating important metrics (look through the in-development functions in the
Functions/utilities.R
script in this repo for examples; and see the BBS_trends_analysis.R
script for examples of how these might be used)