weecology / MATSS-forecasting

Forecasting Analysis Comparison for Ecological Time Series
https://weecology.github.io/MATSS-forecasting/
Other
5 stars 1 forks source link

Updates - May 24 #11

Closed ha0ye closed 5 years ago

ha0ye commented 5 years ago

First, recap of progress on Ward et al. replication:

Technical-bits

  1. focus on 1-step ahead forecasts?
    • clarify what "1 timestep" means for each time series)
  2. what portion of the data is training vs. testing?
    • fixed sections, n-fold CV
  3. models?
    • Ward et al. mostly investigate statistical models, and many of these are not so different from each other conceptually
      • there are linear autoregressive models, nonlinear time-delay embedding models, random walk models, and models that fit some kind of function to time directly (linear, GAM, quadratic, etc.)
      • many of the models are variants that use different approaches to estimate specific components of the above model classes
    • Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

Scientific questions / Paper planning

My preference is to start with something in the union/intersection of 1. and 2. (from #10) - what properties affect forecast skill, how can this be used to identify best models going forward.

Specific qs (that could be used to guide the figures)

stevemunch commented 5 years ago

I like the idea of trying to parse prediction skill by lifespan or temperature. Location is likely to contribute to variation in predictability, but not be particularly illuminating (confounded with environment, number of interacting species, etc). I sort of feel the same way about 'taxonomy' (likely important but only as a proxy for something else). Do we have easily accessible data on species richness?

Another thing that might be good to know is how much better we can do with multiple observations of the same system, e.g. series for several species in one place, series from several nearby locations, etc.

ha0ye commented 5 years ago

Do we have easily accessible data on species richness?

Do you mean richess, per location? The parent project, MATSS, is compiling community datasets, but many of the time series from Ward et al. are single populations devoid of that context.

Another thing that might be good to know is how much better we can do with multiple observations of the same system, e.g. series for several species in one place, series from several nearby locations, etc.

Agreed, though this also requires more detailed sifting of datasets.

pennekampster commented 5 years ago

forecast skill (and permutation entropy) of different models

I would expand on this to include other time series covariates such as length, autocorrelation, time scale etc. This allows us to characterize how a typical ecological time series looks like and whether in different areas of this feature space, different forecasting methods work best. I think this is something Ward et al. did not cover and hence may be easier to sell than features like life span which Ward et al. investigated already.

general guidelines on forecast model selection

These would result from previous insights, and maybe some of the other predictors (taxonomy, location, traits)?

Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

I like the idea but feel this could open a can of worms because there are just so many different ways one could parameterize such models for every single time series. Maybe we could turn this around and rather augment our dataset with simulated population dynamic time series and see how well the statistical approaches capture how we conceptualize population dynamics and their drivers. We would feature some well-established drivers such as density-dependence and interactions with other (unmeasured) species. What do you think?

ha0ye commented 5 years ago

Since we're dealing with population time series, I think it makes sense to include generalized and specific population models

I like the idea but feel this could open a can of worms because there are just so many different ways one could parameterize such models for every single time series. Maybe we could turn this around and rather augment our dataset with simulated population dynamic time series and see how well the statistical approaches capture how we conceptualize population dynamics and their drivers. We would feature some well-established drivers such as density-dependence and interactions with other (unmeasured) species. What do you think?

Great idea! That's definitely more feasible. I was thinking of doing a bit of a process- vs. phenomenological- comparison, but that definitely has its own endless forking-path of modeling choices, and could be easily split off into its own project.