Updates - May 24 - Githubissues

ha0ye commented 5 years ago

First, recap of progress on Ward et al. replication:

I have all of the methods implemented (I think)
- there were some changes I made for consistency
- including confidence intervals for predictions for some of the methods that weren't initially enabled
- enforcing consistency with how the forecasts were made (some of the methods use propagation of 1-step ahead forecasts to do 5-step projections at the end of the observed time series, other methods appear to fit 1-step ahead, 2-step ahead, etc. models to make forecasts for those same points)
@pennekampster is looking into the LPI data, the MATSS team is working on BioTime and Popler dataset integration, BBS data will be implemented in weecology/MATSS#127

Technical-bits

focus on 1-step ahead forecasts?
- clarify what "1 timestep" means for each time series)
what portion of the data is training vs. testing?
- fixed sections, n-fold CV
models?
- Ward et al. mostly investigate statistical models, and many of these are not so different from each other conceptually
  - there are linear autoregressive models, nonlinear time-delay embedding models, random walk models, and models that fit some kind of function to time directly (linear, GAM, quadratic, etc.)
  - many of the models are variants that use different approaches to estimate specific components of the above model classes
- Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

Scientific questions / Paper planning

My preference is to start with something in the union/intersection of 1. and 2. (from #10) - what properties affect forecast skill, how can this be used to identify best models going forward.

(I'm very interested in the model transfer question, but that feels like it may require more fiddling with the metadata than I'd like to get into right now.)

Specific qs (that could be used to guide the figures)

forecast skill (and permutation entropy) of different models
predictors of forecast skill
- taxonomy (pick some obvious groupings?)
- traits (maybe some obvious life history ones?)
- location
- temperature / precipitation (covariance with location, but more mechanistically-focused?)
general guidelines on forecast model selection

stevemunch commented 5 years ago

I like the idea of trying to parse prediction skill by lifespan or temperature. Location is likely to contribute to variation in predictability, but not be particularly illuminating (confounded with environment, number of interacting species, etc). I sort of feel the same way about 'taxonomy' (likely important but only as a proxy for something else). Do we have easily accessible data on species richness?

Another thing that might be good to know is how much better we can do with multiple observations of the same system, e.g. series for several species in one place, series from several nearby locations, etc.

ha0ye commented 5 years ago

Do we have easily accessible data on species richness?

Do you mean richess, per location? The parent project, MATSS, is compiling community datasets, but many of the time series from Ward et al. are single populations devoid of that context.

Another thing that might be good to know is how much better we can do with multiple observations of the same system, e.g. series for several species in one place, series from several nearby locations, etc.

Agreed, though this also requires more detailed sifting of datasets.

pennekampster commented 5 years ago

forecast skill (and permutation entropy) of different models

I would expand on this to include other time series covariates such as length, autocorrelation, time scale etc. This allows us to characterize how a typical ecological time series looks like and whether in different areas of this feature space, different forecasting methods work best. I think this is something Ward et al. did not cover and hence may be easier to sell than features like life span which Ward et al. investigated already.

general guidelines on forecast model selection

These would result from previous insights, and maybe some of the other predictors (taxonomy, location, traits)?

Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

I like the idea but feel this could open a can of worms because there are just so many different ways one could parameterize such models for every single time series. Maybe we could turn this around and rather augment our dataset with simulated population dynamic time series and see how well the statistical approaches capture how we conceptualize population dynamics and their drivers. We would feature some well-established drivers such as density-dependence and interactions with other (unmeasured) species. What do you think?

ha0ye commented 5 years ago

Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

I like the idea but feel this could open a can of worms because there are just so many different ways one could parameterize such models for every single time series. Maybe we could turn this around and rather augment our dataset with simulated population dynamic time series and see how well the statistical approaches capture how we conceptualize population dynamics and their drivers. We would feature some well-established drivers such as density-dependence and interactions with other (unmeasured) species. What do you think?

Great idea! That's definitely more feasible. I was thinking of doing a bit of a process- vs. phenomenological- comparison, but that definitely has its own endless forking-path of modeling choices, and could be easily split off into its own project.

weecology / MATSS-forecasting

Updates - May 24 #11

First, recap of progress on Ward et al. replication:

Technical-bits

Scientific questions / Paper planning