weecology / portalPredictions

Using various models to forecast abundances at Portal
MIT License
9 stars 8 forks source link

seasonality in the time series and models #181

Closed juniperlsimonis closed 6 years ago

juniperlsimonis commented 6 years ago

Currently, the data are input to the forecasting functions as a vector, which gets converted into a time series, by default with a frequency of 1. Having a frequency of 1 means that there's no seasonality to the time series.

The Naive1 and Naive2 models both have the capacity to fit seasonal components to the model but don't given the input.

Do we want to include the seasonal models in these two? If so, we need to figure out how best to handle having a non-integer frequency (12.35 new moons per calendar year)

juniperlsimonis commented 6 years ago

It looks like we'd have to hack the data to get them to work with the existing functions for the ets and auto.arima (Naive1 and Naive2) models, which isn't really desirable, and not something we have to do for the other models (where we can, and will be able to, include day of year as a covariate).

The underlying math, however, looks amenable to having sampling and seasonal periodicities differ, and it seems doable to write our own version of the functions that would allow for that.

juniperlsimonis commented 6 years ago

the combination of missing data and the seasonality and sampling on different cycles means it'd almost certainly be better to re-write these models from scratch than to hack the existing models. like I said, the math will work fine, we just need to code it up to work for what we want.

that will take longer than the timeline of the ms though, and we shouldn't hold that up for the models, esp. given that the models aren't in it. but we want the models to be as good as can be when the ms goes public, so in the meantime then, we need to decide how best to proceed.

given the overwhelming seasonal signal and the fact that we can't include seasonality at all in the naive1 and naive2 models makes me think they're just not going to be appropriate at this stage.

the nb and env poisson models (fit via tsglm) can handle covariates and so while they don't have automatic seasonal modules, it's pretty simple for us to code up seasonality (either as a categorical variable or a continuous one, just let me know which makes more sense).

the downsides to using tsglm are that it doesn't handle missing data (the auto.arima of naive2 was the only one that did) so we still have to deal with that and that tsglm isn't a state space model so the error specification isn't as flexible as we might want in the long run.

i'm currently looking into what would be needed to hack the tsglm function to work with missing data (on either the fit of the parameters or the final likelihood estimate) and will report back on what i find.

the tscount vignette also has refs to a few other available packages that i'm looking at now to see what the overhead and run times look like for building models. at this point we're blurring into writing what we need from scratch, though.

juniperlsimonis commented 6 years ago

closing this as it's basically background for #252