Design and implementation of ColumnEnsembleForecaster

sktime / sktime

A unified framework for machine learning with time series

https://www.sktime.net

BSD 3-Clause "New" or "Revised" License

7.73k stars 1.32k forks source link

Design and implementation of ColumnEnsembleForecaster #1081

Closed GuzalBulatova closed 3 years ago

GuzalBulatova commented 3 years ago

References re-design of Theta Forecaster #854 Implements ThetaLinesTransformer #923

Describe the solution you'd like Implement ColumnEnsemble multivariate to univariate forecaster. It forecasts transformed data (pd.Dataframe returned by ThetaLinesTransformer) with Theta model's standard case where theta_coefficient = [0, 2]. Should return pd.Series - the average of the two forecasts (linear regression and SES with drift).

Related issues: Poor Theta model predictions #421 Implementation of AutoTheta Forecaster #738

Existing implementations:

in sktime sktime/forecasting/theta.py (theta parameter is assumed)

other implementations in R:

Theta https://rdrr.io/cran/forecast/man/thetaf.html
AutoTheta https://github.com/vangspiliot/AutoTheta/blob/master/code
M4 AutoTheta https://github.com/Mcompetitions/M4-methods/tree/master/260 - KaterinaKou
4Theta https://github.com/Mcompetitions/M4-methods/blob/master/4Theta method.R
(boxcox + theta) https://github.com/Mcompetitions/M4-methods/blob/master/260 - KaterinaKou/final_M4.R
Theta in statsmodels https://www.statsmodels.org/devel/examples/notebooks/generated/theta-model.html
differencing is implemented in https://alkaline-ml.com/pmdarima/_modules/pmdarima/utils/array.html#diff

Lovkush-A commented 3 years ago

Exciting to see the Theta stuff getting implemented!

My comments/questions/suggestions

Most important is what is the difference between this and the EnsembleForecaster that already exists? Your design looks very similar to that. Also, based on my understanding of AutoTheta, the current EnsembleForecaster does most of what we want. The only change is to create more aggfuncs:
- weighted mean, not just mean. this would involve adding some way for user to input the weights
- some kind of 'multiplicative' aggregation. This was in the AutoTheta paper. I do not know what it precisely means, but should not be too hard to find out.
In my opinion, the PR #1082 is better for discussing code that is written and pushed, and this issue is better for discussing designs. If you agree, I suggest the following:
- Renaming this issue to something like 'Design and implementation of Col...Forecaster' to make it more distinct from the PR's name
- Move your class design from the PR as a comment in this issue)
I like the thorough list of references!

Lovkush-A commented 3 years ago

Most important is what is the difference between this and the EnsembleForecaster that already exists?

Never mind - I think I know the difference now. EnsembleForecaster is applying different forecasters to a single series, whereas ColumnEnsembleForecaster is applying different forecatsers to different series.

Here are my further comments now that I understand the difference between the two:

I think we want ColumnEnsembleForecaster to be a general multivariate-to-multivariate forecasting tool (not solely for ThetaForecasting): the user provides a list of forecasters which are applied to each column.
1. The specific use of trend and ses would be introduced inside ThetaForecaster, by doing something like ColumnEnsembleForecaster(forecasters = [("trend", PolynomialTrendForecaster(), [0]), ("ses", ExponentialSmoothing(), [1])]
If my thought in 4 is correct, then predict should return a multivariate output, i.e., the predictions for each of the series, without doing any means.
1. The calculation of the mean would happen inside the ThetaForecaster
Jumping ahead, I envision the final ThetaForecaster being defined as some kind of reduction from univariate to multivariate forecasting:

ThetaForecaster = univariate_via_multivariate(
    uni_to_multi_transformer = ThetaLinesTransformer([0,2]),
    multi_forecaster = ColumnEnsembleForecaster(forecasters = [("trend", PolynomialTrendForecaster(), [0]), ("ses", ExponentialSmoothing(), [1])],
    multi_to_uni_transformer/aggfunc = 'mean')

@mloning @fkiraly @GuzalBulatova. Let me know which, if any, of my comments are sensible or not. I could easily be over-complicating things or misunderstood something.

Lovkush-A commented 3 years ago

And one additional minor point: If my thought in 4. is correct (that ColumnEnsembleForecaster hsould be a general multivariate-to-multivariate forecaster), then the ColumnForecaster in #1074 would just be a special case of ColumnEnsembleForecaster

GuzalBulatova commented 3 years ago

Design sketch:

class ColumnEnsembleForecaster(BaseForecaster):

    def __init__(self, forecaster):
        self.forecaster = forecaster

        # format for forecaster as list
        [
            ("trend", PolynomialTrendForecaster(), [0]), #0 -column index of input pd.Dataframe
            ("ses", ExponentialSmoothing(), [1])
        ]

    def _fit(self, y : Union[pd.Series, pd.DataFrame] ...):
        # y multivariate

        # should happen in base class
        if isinstance(y, pd.Series):
            y = y.to_frame()  # make it a pd.DataFrame

        for column in y.columns:            
            forecaster = clone(self.forecaster)
            forecaster.fit(y[column])
            self.forecasters_.append(forecaster)

    def _predict(self, fh, ...):

        y_pred = np.zeros((len(fh), len(self.forecaster_)))
        for forecaster in self.forecaster_:
            y_pred[:, i] = forecaster.predict(fh, ...)

        # average over columns
        return np.mean(y_pred, axis=1)

GuzalBulatova commented 3 years ago

Thank you @Lovkush-A ! I renamed the issue and moved class design into comment section here.

Regarding comment nr 4: I should've been more verbose with the design suggestion.

ColumnEnsemble is a part of ThetaForecaster. Theta itself would look like:

# modular theta forecasting
ThetaForecaster = Pipeline(
    Deseasonalizer()
    ThetaLinesTransformer(theta=(0, 2, ...)), 
    ColumnEnsembleForecaster()
)
pipe.fit(y)
pipe.predict(fh)

Just like you said, average will be weighted, not just mean, mean is the simple option to start (classic theta case). But I think we should make ColumnEnsemble as multivariate-to-univariate, mixing in multivariate output will complicate it too much, but maybe I'm wrong. @mloning and @fkiraly what do you think?

Maybe ColumnEnsemble should be a special case of multivariate-to-multivariate forecaster. Still I think combining the forecasts - returning univariate pd.Series should be separate class. Is averaging multiple forecasts is used somewhere else or is it just Theta method?

AutoTheta would try different theta coefficients and pick the best one (based on MAE if I understand the paper correctly).

pipe = Pipeline(

    ThetaLinesTransformer(theta=(0, 2, ...)), 
    ColumnEnsembleForecaster()
)

param_grid = {"theta": [(0, 2), (0, 2.3), ..]}
gscv = ForecastingGridsearchCV(
    pipe,
    param_grid,
    ...
)

gscv.fit(y)

I haven't yet looked into how handling multiplicative trends is suggested in AutoTheta paper, but I think it makes sense to focus on it in another issue.

mloning commented 3 years ago

Thanks @Lovkush-A - I like you're idea of having a modular aggregator. We would have to figure out how this works inside the pipeline. I talked with @GuzalBulatova today and we agreed to give it some thought, having the multi-to-univariate column ensemble may be an easy fall-back option if the other one requires too much work.

Lovkush-A commented 3 years ago

Happy I could help!

Note that if modular aggregator does not work, it might still be a good idea to have a multivariate-to-multivariate ensemble forecaster.

fkiraly commented 3 years ago

@GuzalBulatova, I do like this idea.

One point I wanted to bring up is the handling of input and output types - the ColumnEnsembleForecaster takes a multivariate series and produces a univariate one. The theta transformer takes a univariate and produces a multivariate. I don't think we have agreed on conventions.

Also, you allude to the initial conversion series->frame being done in the base class.

My thought would be that after #980 this would be automatically taken care of (and input/output types do not matter in the implementation), but in any case we need to think carefully about the conversions and the types involved. There seem to be a lot of case distinctions if we want to have the logic in _fit.

GuzalBulatova commented 3 years ago

Thank you, @fkiraly! No, we haven't agreed on conventions. With ThetaLinesTransformer it was: six input options - int, float, list of int / float, tuple of int / float, inner conversion to list, outputs - pd.Dataframe or pd.Series in case only one Theta-line is needed.

Now with @Lovkush-A suggestions

ThetaForecaster = univariate_via_multivariate(
    uni_to_multi_transformer = ThetaLinesTransformer([0,2]),
    multi_forecaster = ColumnEnsembleForecaster(forecasters = [("trend", PolynomialTrendForecaster(), [0]), 
                                                               ("ses", ExponentialSmoothing(), [1])],
    multi_to_uni_transformer/aggfunc = 'mean')

we'll have ColumnEnsembleForecaster as a part of ThetaForecaster, it'll take pd.Dataframe and return pd.Dataframe.

Then this forecasted pd.Dataframe will be transformed to a pd.Series with ColumnEnsembleTransformer, here's a design sketch:

class ColumnEnsembleTranformer(_SeriesToSeriesTransformer):

    def __init__(self, aggfunc="mean"):
        self.aggfunc = aggfunc

        super(ColumnEnsembleTransformer, self).__init__()

    def transform(self, Z, X=None):
        """Transform data.

        Parameters
        ----------
        Y : pd.DataFrame
            Multivariate series to transform.
        X : pd.DataFrame, optional (default=None)
            Exogenous data used in transformation.

        Returns
        -------
        column_ensemble: pd.Series
            Transformed univariate series.
        """
        # input check - df 

        column_ensemble = np.zeros((Z.shape[0], 1))
        for i, element in enumerate(Z[row]):
            column_ensemble[i] = aggfunc([row])

        return pd.Series(column_ensemble, index=Z.index)

    def _fit(self, Z, X=None): # is it necessary?
        z = check_series(Z)
        self._is_fitted = True
        return self

    aggfuncs = {"mean": _mean, ...}
    if aggfunc not in aggfuncs.keys():
        raise ValueError("Aggregation function %s not recognized." % aggfunc)

Do these input/output types make sense? Happy to make changes where necessary!

mloning commented 3 years ago

Closed by #1082