helske / KFAS

KFAS: R Package for Exponential Family State Space Models
64 stars 17 forks source link

Handle intercepts in the model #61

Closed zjph602xtc closed 3 years ago

zjph602xtc commented 3 years ago

I have a quite standard model but with intercepts:

yt = Z_t at + Cx + e1_t, where e1_t follows a N(0, Ht), a(t+1) = T_t * a_t + D + e2_t, where e2_t follows a N(0, Q_t)

H_t, T_t, Q_t together with two constant intercepts C*x, D are to be fitted. x is a known covariate.

I have two series. For the first one, x = 1; and for the second one, x = 2. Initially, I tried to build two SSmodels and extract their -loglikelihood and then maximize the sum of two loglikelihood to estimate H_t, T_t, Q_t, C, D. (I want to get the same parameters for two series). However, it seems that in the package, there is no place to put the intercept. Not sure how to deal with that.

Thank you!

helske commented 3 years ago

Yes, there are no separate intercept terms in KFAS. What you can do is to add an extra time-invariant (Q=0) state variables C and D to the state vector a, using SSMcustom. Then C and D will be estimated "automatically" via Kalman filter.

Another option is to use bssm package which directly supports intercepts (see ?ssm_ulg for an example).

zjph602xtc commented 3 years ago

"What you can do is to add an extra time-invariant (Q=0) state variables C and D to the state vector a, using SSMcustom. Then C and D will be estimated "automatically" via Kalman filter." But in this way, C and D will be different for the two series, since there are two models.

I will try the bssm package. Thank you!

helske commented 3 years ago

Ah, yes, but you can also build a bivariate model with common C, D etc and estimate that directly instead of two separate models.

zjph602xtc commented 3 years ago

The length of two series is different, and the Z_t is time-dependent (but is known and different for each series) in my case. Not sure whether it is possible to build one bivariate model. If I build one model, I have to assign many NA's in Z_t for the shorter series. Is it an appropriate way? Can you give some hints? Thank you.

helske commented 3 years ago

Yes if you add NA to the end of shorter y_t then the corresponding value in Z is also ignored. Here is a quick sketch:

set.seed(1)
n1 <- 50
n2 <- 30
z1 <- sin(1:n1)
z2 <- cos(1:n2)

C <- 0.6
D <- -0.4
# random walk with drift D
x1 <- cumsum(rnorm(n1) + D)
x2 <- cumsum(rnorm(n2) + D)

y1 <- rnorm(n1, z1 * x1 + C * 1)
y2 <- rnorm(n2, z2 * x2 + C * 2)

library(KFAS)
n <- max(n1, n2)
Y <- matrix(NA, n, 2)
Y[1:n1, 1] <- y1
Y[1:n2, 2] <- y2

Z <- array(0, c(2, 4, n))
Z[1, 1, 1:n1] <- z1
Z[2, 2, 1:n2] <- z2 # trailing zeros are ok, as corresponding y is NA
Z[1, 3, ] <- 1 # x = 1
Z[2, 3, ] <- 2 # x = 2
# last state is only used in state equation so zeros in Z

T <- diag(4) # a1_t for y1, a2_t for y2, C, D
T[1, 4] <- 1 # D affects a_t
T[2, 4] <- 1 # D affects a_t
Q <- diag(c(NA, NA, 0, 0))
P1inf <- diag(4)
model <- SSModel(Y ~ -1 + SSMcustom(Z = Z, T = T, Q = Q, P1inf = P1inf,
  state_names = c("a1", "a2", "C", "D")), H = diag(NA, 2))

updatefn <- function(pars, model) {
  model$Q[] <- diag(c(exp(pars[1]), exp(pars[1]), 0, 0))
  model$H[] <- diag(exp(pars[2]), 2)
  model
}

fit <- fitSSM(model, inits = rep(-1, 2), updatefn = updatefn)

fit$model$Q[1] #0.2542451
KFS(fit$model)
Smoothed values of states and standard errors at time n = 50:
    Estimate   Std. Error
a1  -15.27620    0.86116 
a2  -12.35175    2.81083 
C     0.55220    0.08223 
D    -0.27226    0.06152 
zjph602xtc commented 3 years ago

I see. That is helpful!