wwrechard / pydlm

A python library for Bayesian time series modeling
BSD 3-Clause "New" or "Revised" License
475 stars 98 forks source link

Different results pydlm vs dlm package (in R) #24

Open roinaveiro opened 6 years ago

roinaveiro commented 6 years ago

I get different results in an extremely simple dynamic linear model, using dlm package in R and pydlm. In particular, I create a random walk + noise dlm (that is dlm with just a trend of degree 0 or 1, depending on the version of the pydlm package). I fix the discount factor to be 1, that is W_t = 0 for all t (that is, the variance of the noise of the state evolution equation is null). The noisePrior is 1.0 by default. I fix the prior variance to be 2.0. I include just one observation: 1.70. With this I build the dlm:

Here the code:

from pydlm import *

d = dlm([1.70]) t = trend(degree = 0, discount=1, w = 2) d = d + t d.fitForwardFilter() print(d.getLatentState()) [[1.1333333333333333]] print(d.getLatentCov()) [matrix([[ 0.65444444]])]

Doing the same in R, I obtain the same latent state, but different state covariance. Particularly I obtain 0.666666.

In fact, performing the calculations by hand, I obtain the same as in R. This is quite easy to reproduce, following the formulas in page 53 of this book: Dynamic Linear Models with R, Giovanni Petris, Sonia Petrone, Patrizia Campagnoli.

Am I misunderstanding the meaning of some parameter?

wwrechard commented 6 years ago

Hi roinaveiro,

Thanks for raising this issue. Your understanding is correct, but do note that the underlying assumption is different here between pydlm and the R code.

Based on Page 53 of the book, what the R code doing is similar to pydlm with a key difference that it assumes the observation noise (v_t) is fixed to be 1. While in pydlm, the observation noise (noisePrior) is initialized at 1 but then it will be learned from the data and changing over time (You can tell this from the parameter name). Therefore, the result you get from R will be different from pydlm due to this self-learning property.

One way to get a consistent result using pydlm is to just fix the observation noise or set the initial confidence on the noisePrior to be high so the noise value won't get updated. However, this parameter is current not exposed to the user. Please do let me know if you think it is an important feature to you (i.e., tell the model not to learn observational noise from the data).

Thanks!