wwrechard / pydlm

A python library for Bayesian time series modeling

BSD 3-Clause "New" or "Revised" License

475 stars 98 forks source link

very slow on freq='D' or 365 #63

Open ahmad-shahi opened 1 year ago

ahmad-shahi commented 1 year ago

When I run pydlm on my data with daily collection and seasonality of 365. it is very very slow

wwrechard commented 1 year ago

Hi @ahmad-shahi, could you share a bit more details? a 365 seasonality will create a 365-dimensional transition matrix. Running kalman filter with that means that you are doing a 365 dimensional matrix multiplication and inversion at every step, it is reasonable to take some time to finish. I tested it locally with 1000 data points and it finishes in roughly 30s. For 2000 data points, it finishes in roughly 68s and so on. Please let me know if this matches your observations

Also, please make sure the seasonality of 365 is actually what you want. I assume you want to model the day-of-year pattern, but a seasonality of 365 might not give you that due to the existence of leap year. I would rather use the dynamic component to create day-of-year pattern (a 366-dimensional vector) rather than using seasonality. It should also be faster.

ahmad-shahi commented 1 year ago

Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.

based on your explanation, I believe will be still slow. However, your previous version was much faster. i will try with dynamic component and see how it goes.

Thanks

ahmad-shahi commented 1 year ago

from pydlm import dlm, trend, seasonality

A linear trend

linear_trend = trend(degree=1, discount=0.95, name='linear_trend', w=10)

A seasonality

seasonal365 = seasonality(period=365, discount=0.99, name='seasonal365', w=10)

Build a simple dlm

simple_dlm = dlm(time_series) + linear_trend + seasonal365

Fit the model

simple_dlm.fit()

Plot the fitted results

simple_dlm.turnOff('data points') simple_dlm.plot()

wwrechard commented 1 year ago

Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.

based on your explanation, I believe will be still slow. However, your previous version was much faster. i will try with dynamic component and see how it goes.

Thanks

That's very interesting. Let me take a deeper look and see what is happenning here. Will keep you posted.

wwrechard commented 1 year ago

I just did a quick profiling for a 1000-long time series, it seems most of the time was spent on the numpy functions: dot 7s), pinv and svd (20s). Let me revert the numpy version back and see if there is any regression there.

wwrechard commented 1 year ago

Hi @ahmad-shahi, I tested a few python and numpy versions with 1000 data points and 365 seasonality and didn't seem to find a better performing one.

Python version	numpy version	Profiling of `svd`	Profiling of `dot`
3.11	1.25	20s	7s
3.8	1.20	21s	7s
3.6	1.70	62s	7s

I profiled the dot() and pinv() from numpy independently and it takes roughly 0.8s for 1000 dot() call of a 365-dim matrix and 22s for 1000 pinv() call of a 365-dim matrix. For pydlm, the Kalman filter does rougly 5 times of dot for both fitForwardFilter() and fitBackwardSmoother() in each step which gives a total of 8s assuming 1000 steps. The fitBackwardSmoother() also does 1 exact pinv() for each step which gives a total of 22s. The result is 30s and seems to match with the profiling data

ahmad-shahi commented 1 year ago

Thanks for the details and clarification. What is the alternative option to run DLM for the seasonality of 365? As you said using dynamic component, can you please share an example of how to use it? I did not find the example in the docs. Thanks again and appreciate your good work.

wwrechard commented 12 months ago

Yeah, it's not currently implemented. The basica idea is to get a list of datetime.date of the time series and convert that into a list of 366 dimensional vectors with the coordinate of date of the year being 1 (and all other coordinates are zeros). I'll see if I can find time to implement one.