helske / KFAS

KFAS: R Package for Exponential Family State Space Models
64 stars 17 forks source link

Poor time variability of time-varying regression coefficients and Inconsistency of param estimates among methods in fitSSM #75

Closed ghost closed 1 year ago

ghost commented 1 year ago

Sorry for asking you quite often, but let me ask about time-varying regression with SSMregression terms included.

I'm using a model like below: model<-SSModel(Y~ SSMtrend(1, Q=NA)+ SSMregression(~X1, Q=NA)+ SSMregression(~X2, Q=NA), H=NA, data=dt) and was first doing the fitting process with "BFGS".

Using the fitting result from fitSSM with bfgs, almost flat "level" estimate as well as almost unchanged CIs were obtained for the entire period, while other coefficients for X1 and X2 seemed like time-varying which reflects the true response.

However, when I tried methods "Nelder-Mead" or "SANN", the results change and instead of flat "level" or intercept, now "flat" coefficients for X1 or X2 appears. The results dont change much when I tried SSMtrend(2, Q=list(NA, NA)) instead of SSMtrend(1) , and omitting variables or fixing coefficients for some variables does not seem to solve this problem.

It is possible that local minima cause this kind of problem, and also the stable variance for SSMregression might not be adequate for capturing the time-variability of coefficients. I'm wondering if there is any possible way to stabilize the optimization or modify the model.

helske commented 1 year ago

Indeed, it is possible that there are multiple local modes and changing the optimisation algorithm changes the optimum fitSSM finds. If the issue is multiple modes, you should see the same behaviour with bfgs if you run it multiple times with different initial values. The multimodality could perhaps be due to the way X1 and X2 are generated, if there is somekind of trend (as SSMtrend(1)) in those as well or if they are nearly constant, it might lead to the "switching" behaviour you are seeing (trend and regression components are equally good predictors). Instead of diffuse initialization, some tighter priors (a1, P1 and P1inf) could help, but likely not. Defining the model as component as SSModel(Y ~-1 + SSMregression(~ X1+X2, Q = matrix(NA, 3, 3), remove.intercept = FALSE), H =NA) would allow you to model the correlations between the three time-varying components, which could in theory help.

Another option would be to test the estimation of the same model with the walker package, which performs Bayesian version of your model, and see how the posterior distributions behave. walker also supports integrated random walks for regression coefficients (rw2" in walker) which makes the coefficients vary more smoothly which could maybe help. You can of course do the same thing with KFAS but it needs SSMcustom component then.

ghost commented 1 year ago

Thank you very much, it seems like the combination of SSModel(Y ~-1 + SSMregression(~ X1+X2, Q = matrix(NA, 3, 3), remove.intercept = FALSE), H =NA) and repeatedly applying fitSSM with estimated parameters solved the problem at least for time-variability issue of coefficients. Another problem I'm facing now is that the prediction and CIs from predict(fit.reg$model, interval = "prediction", level = 0.95) is almost flat and the CI seems unexpectedly narrow, and I'm trying to figure out why.

Thank you also for your suggestion of walker. I tested walker, and the result from walker seems rather close to the result from KFAS with slightly more variability in the intercept, but still poor variability (seems like the variability of Y is almost absorbed by the coefficient of X2.) Maybe covariance should also be specified here, so I'll give it a try again.

helske commented 1 year ago

Hmm, if I understood correctly, repeatedly applying fitSSM with previously estimated parameters as initial values sounds like the default number of iterations in fitSSM/optim is not enough, the results should not change from the first run if you already found an optimum (I think SANN method is different though, it could still go an find another mode). What I meant earlier was that you would run fitSSM using multiple "random" initial values, and check the log-likelihood of the fitted models, and see which run gave you the largest value (or smallest value if you just check fit$optim$value as that is -loglik if I remember the implementation correctly).

The fact that you get similar results from walker sounds promising, unfortunately for walker you cannot define covariance structure for the regression coefficients (there is a third package bssm though with as_bssm function for KFAS models... ;))

ghost commented 1 year ago

Sorry, I should have given you details of what I tried:
①I also tried many initial values with BFGS but for every init value different param estimates appear. I thought about picking the param estimates with the lowest value, but the values from different init values were sometimes apparently too different so I was hesitant to pick from them.
②Then I tried Nelder-Mead, but when I try fitSSM again with the previously estimated parameter, the value changed everytime I repeat this procedure. ③The last thing I tried is to first use SANN -> then Nelder-Mead, and repeat with Nelder-Mead again to confirm if the value converged to a certain estimate. The final procedure led to the convergence of parameters without any fluctuation of param estimates when repeating fitSSM with previous param estimates. As ③ seems like a success, it may not be issues caused by maxit or abs/rel tol. (I checked that for every step in ① and ②, convergence=0)

Do you have any idea about almost flat CIs from predict? Y is in the range of about -5~5, and Q[1,1]=8, Q[2,2]=0.05, Q[3,3] =0.02, H is near-zero but I'm wondering if the predict function is accurately reflecting the error structure in my model.

ghost commented 1 year ago

About the issue of flat prediction intervals with model SSModel(Y ~-1 + SSMregression(~ X1+X2, Q = matrix(NA, 3, 3), remove.intercept = FALSE), H =NA): I think the predict function does not reflect the var-covar structure of Q, but instead I found that I can randomly create prediction intervals for each beta0, beta1, beta2 from MASS::mvrnorm.

Anyway, I found that it is difficult for me to replicate the result from KFAS by scratching R codes on my own. I really appreciate and admire your work on KFAS.

helske commented 1 year ago

It is difficult for me to figure out potential problems without having the actual data/code especially regarding why the estimation seems to behave so strangely given such as simple looking model. But the fact that H is estimated near zero can indicate numerical problems in optimization and/or diffuse initialization, potentially due to of poor initial values/model/data. You might want to test constraining H to something like H > 0.01, or defining non-diffuse prior for betas (i.e. set P1inf=0 and for example P1 = diag(10, 3) in SSMregression).

The Q matrix is the covariance of the state process prior seeing any data, so the actual smoothed covariance (covariance of state_t given all Y) should be smaller, and thus the prediction intervals will be narrower than you would expect by looking at Q and H. Given that H is almost zero means that you have very informative data, which leads to very narrow intervals. Note that predict provides intervals for the response variables (or their mean, ie. prediction/confidence intervals), not for the individual states (although you can also get those if you tweak states argument. But this does not really explain why the CIs are flat given thatQ[1,1]indicates that there is time varying level.... Does runningKFS` on your model give any warnings?

ghost commented 1 year ago

I checked and It seems like no warning from KFS. My model gives decent CIs when I specify "states" inpredict, but the CI flattens when I do not specify the "state" and try to predict Y.

By the way, is there any documentation on how the cov-matrix is updated along time in KFAS? I can implement this in scratch when Q is a diagonal matrix(each errors are independent), but I'm not sure how in the case when Q = matrix(NA, 3, 3) as you've suggested. I also tried running this model using Stan, but seems like this task is quite heavy computationally.

helske commented 1 year ago

The filtering and smoothing recursions used in KFAS are available in the appendix of the KFAS paper and vignette: https://www.jstatsoft.org/article/view/v078i10 The predict function uses these same recursions.

You could check manually that the sum of state-specific predictions match with the predictions when you don't specify states. Perhaps it just happens so that while beta_0, beta1x_1, and beta2x_2 varies, their sum is relatively stable in time. Given Q[1,1]=8, Q[2,2]=0.05, Q[3,3] =0.02, it looks like the regression coefficients should be close to constant (depending of course on the scale of X variables), so model with time-invariant coefficients and local level component should give pretty similar results (again assuming that the scale of X isn't large). Note that predict does not give prediction intervals for states itself even when you specify states argument, but for the subset of the signal Z_t*alpha_t. For state-specific intervals you should extract the corresponding values from the output of KFS, i.e. alphahat and V.

I find it unlikely that there is a bug in KFAS related to this as your model seems quite simple and if walker also gives similar results, but you could test this also with a third package, bssm. There is a function as_bssm which you can use so that you can still specify the model with KFAS but compute the log-likelihood with bssm implemetation instead, e.g.

fn <- function(x) {
kfas_model <- SSModel(...)
 # bssm does not support exact diffuse initialization, for states i with P1inf[i,i]=1, bssm uses P1[i,i] = kappa
mod <- as_bssm(kfas_model, kappa= 1e4)
logLik(mod)
}
fit <- optim(...)

Other that this, I'm running out of ideas, I would need to see the actual data and code to really dig in.

ghost commented 1 year ago

Sorry about the late reply, the sum of state-specific predictions matched in my case, so I realized that no error was happening. Thank you also for the suggestions to use bssm. For double-checking purposes, I'm trying to implement the same model using stan but having trouble with running speed (too slow, maybe partly due to sampling from multivariate distribution). Maybe I should try your package at some point.