sktime / sktime

A unified framework for machine learning with time series
https://www.sktime.net
BSD 3-Clause "New" or "Revised" License
8k stars 1.39k forks source link

[ENH] Apply existing sklearn and sklego Transformers on the time index in order to generate features #4343

Closed lardellin closed 1 year ago

lardellin commented 1 year ago

Is your feature request related to a problem? Please describe. In my use case I'd like to model timeseries (time series regression) with both seasonal and aseasonal effects.

Describe the solution you'd like Given the nature of the time series (point of sales forecast) Fourier features are not the most appropriate. Products that occur the entire year can easily be modelled with Fourier Features. Highly seasonal products (e.g. pumpkins) are sold exclusively between August and november, and for this case an approach like the sklego.preprocessing.RepeatingBasisFunction would be more appropriate.

As for aseasonal features, there are several approachs that might be interesting. Both Prophet and Timeseers (as far as I understand) model trend changes though a set of changepoints. Another effective approach would be generating B-spline features with the sklearn.preprocessing.SplineTransformer functionality. More in general, any feature that can be written as $ X_i = f_i\left(t\right) $ could be generated this way.

Describe alternatives you've considered So far I can pregenerate all the above mentioned features. I'd however find usefull if all these steps could be integrated in a pipeline. After perusing both the documentation and the past and present issues I couldn't find a pipeline component that matches what I've desrcribed above. Please point me to any relevant material if this already exist.

The closes behavior that models what I've described is sktime.transformations.series.adapt.TabularToSeriesAdaptor, but it currently has no way of applying transformations to the index.

lardellin commented 1 year ago

Ok, so I've found a way to implement what I've described above:

feature_generator = FeatureUnion([
    ('RBF', TransformerPipeline([
        ('select dayofyear', DateTimeFeatures(ts_freq='D', manual_selection=['day_of_year'], keep_original_columns=False)),
        ('RBF feature', sklearn_to_sktime(RepeatingBasisFunction(n_periods=12, input_range=(1, 365)))),
    ])),
    ('BSpline', TransformerPipeline([
        ('select days since start', TimeSince()),
        ('generate B-spline features', sklearn_to_sktime(SplineTransformer(n_knots=6, degree=3, extrapolation='linear')))
    ]))
])

I'm going to go ahead an close the issue and open documentation issues instead.

fkiraly commented 1 year ago

hm, I think this is interesting - I'm reopening this as we may like to have a general transformer that give you the index as a feature!

fkiraly commented 1 year ago

FYI @lardellin, I've implemented the generic transformer here now: https://github.com/sktime/sktime/pull/4416

This can be pipeliened with any other transformer, or forecaster.