statsmodels / statsmodels

Statsmodels: statistical modeling and econometrics in Python
http://www.statsmodels.org/devel/
BSD 3-Clause "New" or "Revised" License
10.21k stars 3.02k forks source link

ENH: Add Friedman's supersmoother for use in MSTL #8229

Open KishManani opened 2 years ago

KishManani commented 2 years ago

Is your feature request related to a problem? Please describe

Currently, the version of MSTL implemented in Statsmodels does not replicate the same behaviour as the original algorithm when a user specifies no seasonal components. When there is no seasonal component, the original algorithm uses Friedman's super smoother to just extract a trend. To reflect the method as described in the paper it would be good to add this behaviour to the version of MSTL in Statsmodels. One issue is that Friedman's super smoother is not implemented in Statsmodels.

Describe the solution you'd like

Friedman's supersmoother has an implementation in Python here. One solution is to use this in the MSTL implementation. However, I'm not sure whether it's actively maintained and whether it would be acceptable to introduce additional dependencies to Statsmodels.

Describe alternatives you have considered

An alternative would be to re-write a version of Friedman's supersmoother in Statsmodels. Another solution would be to use LOWESS as an alternative method to extract the trend in the short term until a version of Friedman's supersmoother becomes available in Statsmodels.

Additional context

An implementation in R also exists.

I'm happy to take this on, but I'd like any guidance from the maintainers about what path to take here.

fbonaita commented 2 years ago

Hi @KishManani, if I may add a small note, I recently looked into implementing a Python equivalent of tsoutliers.

For that purpose I did some testing with supersmoother and the results were not an exact match of supsmu on which the R version of MSTL is based on. That was especially evident for short time series while for very short time series (n<100) supersmoother seemed even to struggle to fit at all. Finally, I briefly looked into Statsmodels lowess as an alternative too but I found it hard to tune, at least for the anomaly detection use case (not trivial to land on a frac value that would work well for both short and long time series).

TLDR: supersmoother does the job as long as (a) not working with short time series and (b) not looking to replicate supsmu

bashtage commented 2 years ago

We can't take a dependency, but the code could be brought in. I suspect that the code in that package should be Cythonized for performance, but haven't looked closely at how it is implemented.

It is likely that supsmu has some tuning parameters that vary for small sample sizes. It may be the case that these could be reverse engineered, especially if the performance is not so good.