unit8co / darts

A python library for user-friendly forecasting and anomaly detection on time series.
https://unit8co.github.io/darts/
Apache License 2.0
7.9k stars 855 forks source link

Loess Filter that is Vectorised Over `samples` and `components` Axes #1475

Open mabilton opened 1 year ago

mabilton commented 1 year ago

Is your feature request related to a current problem? Please describe.

From what I understand, seasonal decomposition is currently achieved in darts by calling extract_trend_and_seasonality. The limitation of this, however, is that extract_trend_and_seasonality only works for univariate and single sample series, since the statsmodels function(s) called by extract_trend_and_seasonality only work for one-dimensional arrays; more specifically, an error is thrown by darts when series.pd_series() is called inside extract_trend_and_seasonality if series has multiple samples and/or components.

In order to seasonally decompose multiple samples/components of a series, therefore, one must manually loop over each component/sample and pass it to extract_trend_and_seasonality; this potentially adds quite a bit of overhead if the series contains lots of components and/or samples.

It would be cool if there was an STL decomposer in darts that's vectorised over the samples and components axes of the series input. A 'first step' towards this goal would be to implement a vectorized Loess filter, since this forms the 'backbone' of the STL algorithm.

Describe proposed solution

I'd suggest adding a loess_filter function somewhere inside of darts; it should probably have a similar call signature to the statsmodel implementation , so something like:

loess_filter(series, frac, it, delta)

although I'd probably err on the side of changing the names of frac, it, and delta, since these aren't very descriptive. The is_sorted and return_sorted arguments of the statsmodel function can obviously be dropped for the darts implementation, as well as the missing argument (since TimeSeries are assumed to 'contain numeric types only').

Some potential locations for this function could be darts.utils.statistics, darts.models.filtering, or darts.dataprocessing.transformers. Personally, I'm pretty indifferent as to where the function is placed, so any suggestions around this would be welcome.

The critical point to note about this loess_filter function, however, is that it should be vectorized over the samples and components axes of series.

Describe potential alternatives

Simply looping over each sample and component in a series, passing each to extract_trend_and_seasonality. For series with many samples and components, I imagine this may be somewhat prohibitive.

Additional context

I've been playing around with implementing a vectorised loess filter in my spare time, so I'll post a PR with what I've done thus far soon, but any comments and/or suggestions in the mean time would be appreciated.

Cheers, Matt.

hrzn commented 1 year ago

Great suggestion @mabilton ! Regarding the location, I think that if the filtering function returns another TimeSeries (which I guess would be natural?), we can likely regard it as a proper filtering model and make it live in darts.models.filtering. Otherwise darts.utils.statistics is a good place.

mabilton commented 1 year ago

Hey @hrzn. Thanks for the suggestion regarding where to place the Loess filter - that all makes sense.

As I mentioned, I have been working on this 'in the background', and I do have a working prototype at this point in time. The difficulty, however, is that although the 'vectorised' implementation is faster when thousands of series are passed to it, it performs substantially worse than the original statsmodels implementation when passed a single timeseries that has many timesteps.

If you want, I can open a draft PR with what I have and we can 'move the discussion' there. With that being said, I'm cognizant of the fact that this feature is more of a 'nice to have' than something which is essential, and that you guys already have a lot of PRs/Issues to work through as is. Let me know what you think and we can go from there.

Cheers, Matt.

hrzn commented 1 year ago

Hi @mabilton yes please feel free to open a Draft PR. We can perhaps consider a rule of thumb and switch to relying on statsmodels whenever for instance the product len(series) * series.n_components is below a certain empirical threshold; I think some trick along those lines could be acceptable. It's not super high prio but we're always happy to improve Darts in "unexpected" ways :) especially if that's something you've anyway been working on.

madtoinou commented 1 year ago

Hi @mabilton,

Any progress on this improvement of the seasonality decomposition with the Loess filter?