koaning / scikit-lego

Extra blocks for scikit-learn pipelines.
https://koaning.github.io/scikit-lego/
MIT License
1.28k stars 117 forks source link

[FEATURE] Decay functions #581

Closed FBruzzesi closed 11 months ago

FBruzzesi commented 1 year ago

While working on the two tiny PRs #577 and #580 , I noticed that although DecayEstimator has a decay_func argument, the only implemented decay is exponential.

I have in mind some decay functions that we could support, however it could break the API, which is something that nobody is looking forward to.

The idea is similar to how shrinkage works for GroupedPredictor:

class DecayEstimator(BaseEstimator):
    def __init__(self, model, decay_func, **decay_kwargs):
        self.model = model
        self.decay_func = decay_func
        self.decay_kwargs = decay_kwargs

    def fit(self, X, y):
        ...
        sample_weight = self.decay_func(n_samples, **self.decay_kwargs)
        ...

where decay_func can be either a string (mapping to the "ready to use" decay functionalities) or a callable that follows a given protocol. By which I mean that, for example, the first argument should be the number of samples - which is enough to compute the exponential decay (and many others).

Here the list of decay functions I have implemented if we move forward with this (example with sample size of 500): decays

Edit: In my mind, all of them can accept a min_value and a max_value which scale the shape of the decay (so that 0 is never actually reached)

koaning commented 1 year ago

I like the idea. The only thing I'm not sure about is passing an actual function to decay_func. I'm open to allowing a function, but for gridsearch stuff later on ... it might be nice if the base decay functions that we provide can also be referenced with a string.

One thing about that periodic decay ... that's not really a decay. Got a use-case in mind for that one?

FBruzzesi commented 1 year ago

it might be nice if the base decay functions that we provide can also be referenced with a string.

Absolutely, that's what I meant by a string (mapping to the "ready to use" decay functionalities)

periodic decay ... that's not really a decay. Got a use-case in mind for that one?

For time series with given fixed seasonality, "Linear * Periodic" decay is something I used in the past.

Should we keep those in the same module of DecayEstimator? The idea of importing them from sklego.meta or sklego.meta.decay_estimator is not too appealing at first glance tbh

koaning commented 1 year ago

Should we keep those in the same module of DecayEstimator? The idea of importing them from sklego.meta or sklego.meta.decay_estimator is not too appealing at first glance tbh

It would be close together so it feels logical. What other place might make more sense?

FBruzzesi commented 1 year ago

One option could be to have them private in something like meta._decay_utils.py. I don't have a strong argument for any of the two cases, mainly posing the doubt 😁