tidyverts / fable

Tidy time series forecasting
https://fable.tidyverts.org
GNU General Public License v3.0
564 stars 66 forks source link

Feature request: bsts #213

Closed jonekeat closed 3 years ago

jonekeat commented 4 years ago

I was wondering is there any future plan to add support for the Bayesian Structural Time Series model as provided by Google bsts package in fable? There is a blog introducing this package: http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html

Thanks for all your hard work, it is really excited to see an unified interface for time series modelling in R.

mitchelloharawild commented 4 years ago

I certainly would like to support more models, and structural models is planned. This may be added via a new extension package rather than included in the fable package.

jonekeat commented 4 years ago

Thanks for the prompt reply, I am looking forward for the new extension package

StatMixedML commented 4 years ago

Related to the question above:

is there any functionality (planned) to allow the user add models her/himself? I refer to the caret or mlr package. Both of them have a unified way of training models and the user can easily add models, adhering to the package design.

mitchelloharawild commented 4 years ago

Yes - fable is extensible by design. The fable package itself is an extension (only providing some common forecasting models). The framework and tooling to make it all work is provided in the fabletools package.

Currently the interface is a bit too experimental (especially in the back end) for me to encourage users adding models themselves, however once things are more refined I will be writing extension vignette(s) (https://github.com/tidyverts/fabletools/issues/15).

There are currently a couple of examples which show how extension packages can be written:

A model is made with a few components:

  1. A training function that returns a fitted model.
  2. A list of 'specials', these are functions specific to that model for the formula interface. These can be useful for pre-processing the input data into model inputs.
  3. A set of methods appropriate for that model (forecast.xyz, residuals.xyz, glance.xyz, components.xyz, generate.xyz, etc.)

If you'd like to write an extension model, let me know and I'd be happy to work with you and refine this model development interface further.

davidtedfordholt commented 4 years ago

I've spoken a little bit to Steven Scott (the author of bsts) about a formula-based implementation of bsts as a fable extension. I have the framework laid out for that (though including priors is a bit tricky at times). Because the creation of the state specification is similar to the method of specification for prophet, I have just been following the methodology for your implementation of fable.prophet, but that makes it a bit hard to test things as I go along with an implementation.

I would love some guidance or assistance, as I'm just working on it in my spare spare time and would love to be a bit more efficient.

mitchelloharawild commented 4 years ago

Wow, looks great! I'd be very happy to help with this as an extension package. I think collaboration here would lead to a great extension package, and also improvements to how a model is developed for fable. Is there something specific you're having trouble with for testing? If there's anything you get stuck with let me know.

davidtedfordholt commented 4 years ago

I'm very happy to let my struggles help others! I would love to eventually also get forecTheta or TSA::arimax() wrapped up, if no one is planning on opening them up and redoing them, under the hood. My biggest issue is actually my own methodology, which included just trying to swap things out from fable.prophet, without having really understood the parsing of the formulas and the specification of specials. The state specification for bsts is just a list, and can have multiple trends added to it, as well as the wrong forms of priors added to it, which need to be constrained. If I'm honest, it's because I'm a statistician more than a computer scientist, and I'm wanting to do this in order to become better at thinking like a computer scientist.

mitchelloharawild commented 4 years ago

Theta models are planned for inclusion in fable (#41). Currently it is planned to implement forecast::thetaf(), but presumably I will support forecTheta in its generality. The hold up for this is the forecast distribution of its equivalent stochastic model (https://doi.org/10.1016/S0169-2070(01)00143-1), which will require a more general implementation of distributions (like https://github.com/alexpghayes/distributions3, but vectorised and more flexible). (this is a hold up for croston's method, not theta.)

ARIMAX is very similar to what we already have in ARIMA() which supports exogenous regressors (giving 'regression with ARMA errors'). As for TSA::arimax(), this model is more closely described as a transfer function model (https://robjhyndman.com/hyndsight/arimax/). It is planned to support this model class later, but the current priority is on model parity with the forecast package.


Regarding specials and formula parsing, here is a rough idea of what happens:

  1. Get a formula (transform(y) ~ trend() + trend() + season() + x)
  2. Parse the left of formula to identify response, transformations and inverse (this is completely handled by fabletools, nothing required here)
  3. Parse the right of formula to compute specials (this runs your functions as defined in your specials, in the context of your model's data). The returned values of these specials will be passed to your functions in a list via the specials argument. In the above formula, there are two trend specials, one season special, and x (which is passed to the xreg() special). If you need to constrain the usage of specials (such as only one season()) this can be done in the training function by checking the lengths of the specials argument.

Also, I think being a statistician more than a compsci is great here. Part of the fable design goal (which clearly needs work) is to make it easy for statisticians with new models to create a package that integrates well with other models (via combinations, reconciliation, etc.).

davidtedfordholt commented 4 years ago

The top makes perfect sense to me. My desire for TSA::arimax() is entirely because it allows the specification of a transfer function. I assumed all of forecast would find itself in fable, and am excited about it. Does it seem like someone putting time against the task of vectorizing distributions3 would be helpful? I ask because the author seems pretty open (from a glance at the closed PRs) and it's not an onerous task.


Thanks, that helps, especially with knowing what I don't have to think about on the left. I just went and looked at the list structure once the model is specified, and I've got a solid idea on how to constrain things in terms of how many of a type of special to allow. It'll include multiple season() arguments sometimes, but not multiple trend() arguments. I still need to think a little about whether or not to constrain priors that are fed into a special to only those generating functions that "seem" to make sense.


I just see the word reconciliation and I get all excited. reconcilethief is one of my current minor obsessions, though not used with thief. I'm very glad that's coming.

mitchelloharawild commented 4 years ago

I've been working on creating vectorised distributions, which might get merged into distributions3 or become a standalone package (https://github.com/tidyverts/fabletools/issues/123).

Specifically for the theta model, the distributions would also need zero-inflated variants. Additionally for fable's transformations, this would then require transformed zero inflated distributions. Extending the distributions3 package to support this isn't trivial (and I'm currently approaching it as a rewrite).

StatMixedML commented 4 years ago

Concerning additional distributions, like zero-inflated and the like, the gamlss package in general and the gamlss.dist package in particular might be of interest

https://cran.r-project.org/web/packages/gamlss.dist/index.html

The gamlss.dist package provides a set of distributions which can be used for modelling the response variables in Generalized Additive Models for Location Scale and Shape, Rigby and Stasinopoulos (2005). The distributions can be continuous, discrete or mixed distributions. Extra distributions can be created, by transforming, any continuous distribution defined on the real line, to a distribution defined on ranges 0 to infinity or 0 to 1,

davidtedfordholt commented 4 years ago

How can I interrogate a specials list as it would arrive in the training function? I don't know how multiple specials of the same name (e.g. value ~ trend() + trend()) would show up. I'm guessing I would find their specifications in specials$trend[[1]] and specials$trend[[2]], but not entirely sure.

mitchelloharawild commented 4 years ago

Yes, that's correct.

It would also be in the same list structure for value ~ trend(), accessed via specials$trend[[1]].

davidtedfordholt commented 4 years ago

This may be already available and I'm just missing it. Is there a way to take a model specification and create the output that is provided to the train_*() function? That way, it would be easy to create a dummy workflow. A parameter to the fabletools::model() function that stopped before actually fitting a model and just outputs as.list(environment()) would work, I would think. It gives the creator of the extension a simple way to investigate exactly what they're working with in the train_*() step.

It might also be worth having a diagram that shows exactly what gets passed to a special like xreg based on a model specification that just includes the variable name. Or, in general, a diagram that shows what data is available (and the naming convention for it) at any given point in the process. I'm happy to help work on that, but I haven't taken the time to try and take apart fabletools enough to take a swag yet.

mitchelloharawild commented 4 years ago

Sorry for the late reply on this one. The function that calls the train_*() function is estimate.tbl_ts(), so you should be able to debug() this to see what is going on (including parsing specials).

I've also started writing the vignette for adding models with fabletools: https://fabletools.tidyverts.org/dev/articles/extension_models.html

The process of creating a model is outlined, and tomorrow I'll be writing more about adding methods. Your thoughts on the vignette would be great, especially for points of confusion that you've experienced while writing fable.bsts.

Fuco1 commented 4 years ago

Thank you both for the discussion. I'm the opposite of David (more compsci than statistician), but things were still a bit opaque. Reading this and the extensions article I've now managed to imeplement my first custom model :tada:

Fable seriously rocks! (and to think this is only a "beta" version, what more goodies are to come)

mitchelloharawild commented 4 years ago

Great to hear @Fuco1! Is the extension model open source by any chance? It would be nice to see what you've done with fable.

Fuco1 commented 4 years ago

@mitchelloharawild Not open source at the moment, it is rather simple though, basically encoding a ton of business rules that we've come up with over the years. It's not very scientific but works well in practice.

englianhu commented 3 years ago

I was wondering is there any future plan to add support for the Bayesian Structural Time Series model as provided by Google bsts package in fable? There is a blog introducing this package: http://www.unofficialgoogledatascience.com/2017/07/fitting-bayesian-structural-time-series.html

Thanks for all your hard work, it is really excited to see an unified interface for time series modelling in R.

Here is a tutorial where using python and prophet, might try to use reticulate to call the Bayesian model. https://m.youtube.com/watch?v=jo12CWZ00Lo

mitchelloharawild commented 3 years ago

Closing as this model will not be added into fable, but can be made available via an extension package. This tracker has been consolidated into https://github.com/tidyverts/fable/issues/344