Why are the estimated series different from the input series?

christophsax commented 5 years ago

Finally able to compile it on my mac. I can now do reprex::reprex(), which is a nice way to ensure an example is reproducible.

Question is probably a bit a dummy question.

library(BDFM)
library(tsbox)

fdeaths0 <- fdeaths
fdeaths0[length(fdeaths0)] <- NA
dta <- cbind(fdeaths0, mdeaths)

m <- dfm(dta, forecast = 2)
ts_plot(dta, predict(m))

^{Created on 2018-11-10 by the reprex package (v0.2.1)}

christophsax commented 5 years ago

When I estimate two factors, I get fdeaths as well:


library(BDFM)
library(tsbox)

fdeaths0 <- fdeaths
fdeaths0[length(fdeaths0)] <- NA
dta <- cbind(fdeaths0, mdeaths)

m <- dfm(dta, factors = 2, forecast = 2)
ts_plot(dta, predict(m))

But why do I get mdeaths exact and fdeaths not?

srlanalytics commented 5 years ago

When you estimate the model with one factor (the first example) the model selects a series that contains the most information. Though estimation is Bayesian it's easiest to think about in maximum likelihood terms --- the factor will be that which maximizes the likelihood of the observations, i.e. has the most predictive power. In this example the factor fits mdeaths much more closely than fdeaths. The easiest way to see this is by looking at the estimated errors in the observation equation:

library(BDFM)
library(tsbox)

fdeaths0 <- fdeaths
fdeaths0[length(fdeaths0)] <- NA
dta <- cbind(fdeaths0, mdeaths)

m <- dfm(dta, forecast = 0)
ts_plot(dta, predict(m))
diag(m$R)

which give the output 1012.4917 172.0752. The variance of shocks to observations of the first series is estimated to be 1012.4917, the variance of shocks to observations of the second is estimated to be 172.0752... obviously much less. You can see that predictions are not exactly the observed series for both series:

predicted <- predict(m)
difference <- dta-predicted
ts_plot(difference)

dfm_diffs

It's just that difference between predictions and observations are much smaller for mdeaths.

In the second example, with two factors and two observations the model can fit observations much more closely, as with principal components. If you were to do this using principal components, you would get exactly the observed series. This example is almost the same... the only (very small) differences between predictions and observations is due to inter-temporal smoothing.

Just for fun, if you wanted to force the predicted values to be smoother you could specify that using the prior "degrees of freedom" nu_q in the transition equation:

m <- dfm(dta, factors = 1, forecast = 0, nu_q = 15)
ts_plot(dta, predict(m))

dfa_smooth

However, using strong priors like this doesn't always work out if it makes the transition equation non-stationary, so be careful!

christophsax commented 5 years ago

Ok. There is no reason to expect predict to be identical than the original series. Use adjusted() if you want to have the original values if available.

srlanalytics / bdfm

Why are the estimated series different from the input series? #16