Open ahmad-shahi opened 1 year ago
Hi @ahmad-shahi, could you share a bit more details? a 365 seasonality will create a 365-dimensional transition matrix. Running kalman filter with that means that you are doing a 365 dimensional matrix multiplication and inversion at every step, it is reasonable to take some time to finish. I tested it locally with 1000 data points and it finishes in roughly 30s. For 2000 data points, it finishes in roughly 68s and so on. Please let me know if this matches your observations
Also, please make sure the seasonality of 365 is actually what you want. I assume you want to model the day-of-year pattern, but a seasonality of 365 might not give you that due to the existence of leap year. I would rather use the dynamic
component to create day-of-year pattern (a 366-dimensional vector) rather than using seasonality. It should also be faster.
Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.
based on your explanation, I believe will be still slow. However, your previous version was much faster. i will try with dynamic component and see how it goes.
Thanks
from pydlm import dlm, trend, seasonality
linear_trend = trend(degree=1, discount=0.95, name='linear_trend', w=10)
seasonal365 = seasonality(period=365, discount=0.99, name='seasonal365', w=10)
simple_dlm = dlm(time_series) + linear_trend + seasonal365
simple_dlm.fit()
simple_dlm.turnOff('data points') simple_dlm.plot()
Hi, thanks for looking into the problem. My data is seasonal, starting in June and ending at the end of May, and this pattern is repeated. My data is a daily collection.
based on your explanation, I believe will be still slow. However, your previous version was much faster. i will try with dynamic component and see how it goes.
Thanks
That's very interesting. Let me take a deeper look and see what is happenning here. Will keep you posted.
I just did a quick profiling for a 1000-long time series, it seems most of the time was spent on the numpy
functions: dot
7s), pinv
and svd
(20s). Let me revert the numpy version back and see if there is any regression there.
Hi @ahmad-shahi, I tested a few python and numpy versions with 1000 data points and 365 seasonality and didn't seem to find a better performing one.
Python version | numpy version | Profiling of svd |
Profiling of dot |
---|---|---|---|
3.11 | 1.25 | 20s | 7s |
3.8 | 1.20 | 21s | 7s |
3.6 | 1.70 | 62s | 7s |
I profiled the dot()
and pinv()
from numpy
independently and it takes roughly 0.8s for 1000 dot()
call of a 365-dim matrix and 22s for 1000 pinv()
call of a 365-dim matrix. For pydlm
, the Kalman filter does rougly 5 times of dot
for both fitForwardFilter()
and fitBackwardSmoother()
in each step which gives a total of 8s assuming 1000 steps. The fitBackwardSmoother()
also does 1 exact pinv()
for each step which gives a total of 22s. The result is 30s and seems to match with the profiling data
Thanks for the details and clarification. What is the alternative option to run DLM for the seasonality of 365? As you said using dynamic component, can you please share an example of how to use it? I did not find the example in the docs. Thanks again and appreciate your good work.
Yeah, it's not currently implemented. The basica idea is to get a list of datetime.date
of the time series and convert that into a list of 366 dimensional vectors with the coordinate of date of the year
being 1 (and all other coordinates are zeros). I'll see if I can find time to implement one.
When I run pydlm on my data with daily collection and seasonality of 365. it is very very slow