helske / KFAS

KFAS: R Package for Exponential Family State Space Models
64 stars 17 forks source link

Model non-gaussian multivariate outcome #62

Closed zjph602xtc closed 3 years ago

zjph602xtc commented 3 years ago

I have a four-dimensional time-series outcome (y1, y2, y3, y4), where (y1, y2) are multivariate gaussian, and y3, y4 are Poisson. I am using SSModel function to build a state-space model.

I hope to check several things: 1) I set distribution = c('gaussian', 'gaussian', 'poisson', 'poisson') to indicate the type of each response. 2) The H matrix is like this: H = [1 0.5 NA NA 0.5 1 NA NA NA NA NA NA NA NA NA NA] Since only (y1, y2) uses H, so that I put all other elements to NA. Also, we have to assume that (y1, y2) is independent with y3, and y4, right? There is no place to model their correlation. 3) The u matrix is like this: u = [NA NA 5 6 NA NA 5 6 NA NA 5 6 .... NA NA 5 6] Is it in the right format? 4) I used "cbind(y1, y2, y3, y4)" as the response variable.

Could you please check these settings? The document does not include an example of non-gaussian multivariate outcome. So that I have to guess some of the parameters. My example should also help other users. Thank you!!

zjph602xtc commented 3 years ago

I found that the previous way is not right. The codes show that: Warning message: In SSModel(cbind(y1, y2, y3, y4) ~ -1 + SSMcustom(Z = Zt, T = Tt, R = Rt, : H ignored as model contains non-gaussian series.

Then I found that in the non-gaussian model, for Gaussian distribution, var(y) = u, so I guess: 1) Do not include a H matrix. 2) The u matrix should be like this: u = [1 1 5 6 1 1 5 6 1 1 5 6 .... 1 1 5 6] The first and second columns are the variance for y1 and y2. So that y1, y2, y3, y4 must be mutually independent.

helske commented 3 years ago

Thanks, I should add an example of this (there is an example of multivariate Poisson case in the KFAS paper though).

You are correct, matrix H is not used for non-Gaussian case, and the first column of u corresponds to the "extra parameter" for the first time series and so forth. You can also augment a noise term for the state equation which can be used for example to model the covariance structure of the error terms of y1 and y2. In this case, you should set the corresponding u values to zero, and take this state into account when checking the results (see the example in the KFAS paper of this kind of noise term in Poisson case).

zjph602xtc commented 3 years ago

Thanks for your response. That's helpful!

Regarding to this model, I find that the logLik(model) will return a positive number sometimes. So I am wondering whether this is normal result. Does logLik return the true log-likelihood? If does, I guess it should be less than 0.

I could provide a minimal example for this if necessary. Thank you.

helske commented 3 years ago

Yes, the returned log-likelihood contains all the constant terms. It can be positive as the density can be larger than one (e.g., dnorm(0,sd=0.1,log=TRUE))

zjph602xtc commented 3 years ago

I see. It is the log density not prob. I was asking something stupid. Thank you!