koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.25k stars 117 forks source link

[FEATURE] Time Series Meta Regressor #471

Open Garve opened 3 years ago

Garve commented 3 years ago

Heya!

How about some meta model for time series? It can consist of two models, such as in the ZeroInflatedRegressor: One model to capture trend (and optionally seasonality) of the time series, another one for the rest. It's basically a special kind of decomposition model, similar to Prophet, but more general, simpler, and scikit-learn compatible.

In the end you have prediction = trend (e.g. via linear regression) + seasonality (e.g. via RepeatingBasisFunction) + second_model output.

Training procedure: (Input X, y)

  1. first_model.fit(X, y)
  2. Detrend (and deseasonalize) the training data via y' = y - first_model.predict(X). Some kind of stationary y' remains.
  3. second_model.fit(X, y').

Prediction procedure:

  1. Output first_model.predict(X) + second_model.predict(X).

It's simple and similar to ZeroInflatedRegressor and works well together with your RepeatedBasisFunction. And I think it belongs here in contrast to sktime because sktime does not focus on exogenous regressors that much (do they even at all?). With this model, you could also use KNN tree-based approaches for time series forecasting without worrying about trends that the trees can't capture alone. In my opinion, modelling the trend directly is also better than differentiating y and then train a tree-based method with these labels because prediction errors propagate and increase into the future. Also, it's interesting also compared to Prophet because you can customize it much more. For example, there might be exogenous variables that influence the outcome positively (e.g. advertising spends), but there is no way to specify this in Prophet, as far as I know.

What do you think?

Best Robert

koaning commented 3 years ago

I once contemplated something like this, but then I discovered a small ocean of scikit-learn compatible packages that try to tackle the timeseries problem. Glancing at the related packages guide of sklearn I found two packages in this space. Both sktime and tslearn seen quite popular too.

This makes me wonder, do those two packages not over your use-case?

koaning commented 3 years ago

Also, now that I think about it. You can already do this with our package I think. Simply wrap a pipeline that predicts season via the repeatedbasisfunction and use that as a featurizer by using an estimator transformer.

Garve commented 3 years ago

Heya!

Didn't try tslearn, but I think with sktime it's not possible. the make_reduction forecasters also always need a siding window of size >0, so I can't do regression without inputs of the same time series.

Regarding the estimator transformer: I don't understand exactly how to embed the trend here. Seasonality is fine, but how would you model a linear trend for example?

koaning commented 3 years ago

Can't you encode time as a linearly increasing variable and give that to a linear regression?

koaning commented 3 years ago

I imagine that a StackingRegressor might also be appropriate in this realm.

Garve commented 3 years ago

The problem is that I also want to take a difference between the old labels and the predictions of the first model in between. Is there any way to do this?