tidyverts / fabletools

General fable features useful for extension packages
http://fabletools.tidyverts.org/
89 stars 31 forks source link

Extend interface to support multi-keyed models #168

Open mitchelloharawild opened 4 years ago

mitchelloharawild commented 4 years ago

Currently each model created in estimate() contains no key variables, which limits the ability to use global models.

Adding the key dimension to modelling requires substantial work in generalising methods to support this new dimension.

mitchelloharawild commented 4 years ago

cc @anastasiospanagiotelis @Sayani07

hongooi73 commented 4 years ago

At a minimum, I suspect this will mean giving up the concept of a mable as a rectangular data structure.

CorentinLemaitre commented 4 years ago

Hello, I don't understand exactly how this solves the issue I have described in #175. I am french so sorry if i have badly understood, and badly explained.

My goal is not to make multiple model and aggregate them. My goal is to use different time series of the same dimension as regressor. One simple example is : modeling the population of country A depending on the population of the other countries in the world. It is not really a forecast but a nowcast or a validation of data.

mitchelloharawild commented 4 years ago

Hi.

The problem you described in https://github.com/tidyverts/fable/issues/175 is currently not possible with the interface design of fable. You can see this in your attempt to construct this model: TSLM(Population ~ Population) (where the regressor population refers to the population of other countries). The Population regressors are stored in series with different keys (different Countries), and currently fable models only consider one series/key at a time.

To fix this issue (and allow for a model specified like you desire), the interface needs to be extended to allow models to use data across multiple keys/series. This would allow you to model Population from one series, using the Population of the other series. This interface extension won't directly implement the model you want, but it allows for a model of that style to be created in the first place.


The current approach to this problem is to use your data in a wide format with pivot_wider(), and then write separate models for each country of interest. Resolving this issue allows an alternate interface to multivariate modelling which is better suited for your problem.

mitchelloharawild commented 4 years ago

Update on this: To maintain a consistent structure in the mable, each model in a mable must contain the same number of series. A pre-model function is used to specify which keys are passed into each model.

If a univariate model is specified, each series will be estimated separately. However the return object will nest these models, and allow for additional behaviour such as handling covariances between the nested model's residuals. If a multivariate model is specified, all series are passed into the estimation function.

This will require some restructuring, as fabletools assumes uni-keyed modelling in many places.

hongooi73 commented 3 years ago

Hi @mitchelloharawild, has there been any progress lately on this?

mitchelloharawild commented 3 years ago

@njtierney is interested in this functionality for some INLA type modelling. It is also more important to have this functionality now for global models such as a global AR.

mitchelloharawild commented 3 weeks ago

Design idea: this could be added to the lhs of the model formula with some infix operator at the top level of the AST.

For example with @ as the separator:

<model>(<response(s)> @ <tidyselect keys> ~ <specials>)

It isn't immediately obvious to me if specifying keys like this should include or exclude them. The default must be the same as current behaviour, which is including all keys. So perhaps selected key columns should persist into the model object (in a sense @ becomes a selector of the keys kept within the model).