helske / KFAS

KFAS: R Package for Exponential Family State Space Models
64 stars 17 forks source link

Positive log likelihood for Poisson and Binomial distribution #63

Closed zjph602xtc closed 3 years ago

zjph602xtc commented 3 years ago

Dear Dr. Helske,

You said that 'the logLik function returns log-likelihood contains all the constant terms.', which means for discrete distributions like Poisson or Binomial, it should be less than 0.

Here, my response variable is dat$y which is binary. I tried to use Poisson or Binomial distribution to model it. But in the following example, I will get positive log-likelihoods.

Tt <- diag(c(0.4434,0.9962,0.9996,rep(1,6)))
Qt <- diag(rep(3.3576,3))
P1inf <- matrix(0, ncol = 9, nrow = 9)
diag(P1inf)[1:3] <- 1
Zt <- matrix(c(1.2670, 1.2729, 0.5930, 0, rep(0.5900,5)), nrow = 1)
Rt <- rbind(diag(1, nrow = 3, ncol = 3), matrix(0, nrow = 6, ncol = 3))
a1 <- matrix(c(0, 0, 0, 1, 1, 0.2433, 2, 1, 1), ncol = 1)

dat <- data.frame(y = c(0, 1, 1, rep(0, 49)))
u <- matrix(1, nrow = NROW(dat), ncol = 1)

Try the Poisson distribution:

m <- SSModel(y ~ -1+SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1inf = P1inf), data = dat, distribution = 'poisson', u=u)
logLik(m) # returns 13.00312 without any warning or error.
logLik(m, marginal = T)  # returns 13.83211 without any warning or error.

Try the Binomial distribution:

m <- SSModel(y ~ -1+SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1inf = P1inf), data = dat, distribution = 'binomial', u=u)
logLik(m) # returns 17.71917 without any warning or error.

Not sure why this happens. Thank you for your help!!

Regards Peter

helske commented 3 years ago

It seems that the Gaussian approximation fails silently with exact diffuse initialization (using P1inf) combined with your model structure. First of all, if you switch away from the exact diffuse initialization you get negative likelihood:

P1 <- matrix(0, ncol = 9, nrow = 9)
diag(P1)[1:3] <- 10000
m2<- SSModel(y ~ -1+SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1 = P1), data = dat, distribution = 'binomial', u=u)
logLik(m2) # -12.7887

Let's simplify your model a bit by removing the constant states and reducing number of observations:

Tt <- diag(c(0.4434,0.9962,0.9996))
Qt <- diag(rep(3.3576,3))
P1inf <- matrix(0, ncol = 3, nrow = 3)
diag(P1inf)[1:3] <- 1
Zt <- matrix(c(1.2670, 1.2729, 0.5930), nrow = 1)
Rt <- diag(3)
a1 <- matrix(c(0, 0, 0), ncol = 1)

dat <- data.frame(y = c(0, 1, 1, 0))
u <- matrix(1, nrow = NROW(dat), ncol = 1)
m3 <- SSModel(y ~ -1+SSMcustom(Z = Zt, T = Tt, R = Rt, Q = Qt, a1 = a1, P1inf = P1inf), data = dat, distribution = 'binomial', u=u)
logLik(m3) # 42.27624

Still failing. Checking at the Gaussian approximation shows "convergence" to some extreme case:

> c(approxSSM(m3,tol=1e-16)$H)
[1] 7.954668e+16 7.319107e+25 1.705125e+16 3.266442e+16
> c(approxSSM(m3,tol=1e-16)$y)
[1] -39.91512   0.00000  42.00000 -39.02506

So essentially the Gaussian approximation suggests zero signal-to-noise ratio.

Without exact initialization things look quite different:

> m4 <- m3
> m4$P1inf[]<-0
> m4$P1[]<-diag(1e5, 3)
> c(approxSSM(m4)$H)
[1] 12.906387  7.094437  7.710829  5.496582
> c(approxSSM(m4)$y)
[1] -3.473307  2.791681  2.891019 -2.471853

So the diffuse initialization causes the Gaussian approximation to fail here. Even this simplified model is borderline nonidentifiable though given that the linear predictor Zalpha_t is a linear combination of multiple AR(1) processes, and second and third states are both almost random walks, and the observations are just ones and zeros, so essentially there isn't much information to be gained from the series compared to the very flexible three-state model without any knowledge of their initial values (i.e. diffuse initialization).

zjph602xtc commented 3 years ago

Thank you! When I tried to use optim to maximize the likelihood, then sometimes I will get these wired parameter that provide large (but wrong) likelihood. Then my question is: 1) Is there a way to determine whether the approximation fails or not? Since the logLik function does not give any warnings in this case. Maybe that approxSSM(model)$H is too large indicates the failure? 2) If I want to estimate some parameters using optim(fn=-logLik), are there any suggestions to make it get the right result? I tried different initial values (which provide correct and good likelihood) but it will stop at positive likelihood. Thank you!

helske commented 3 years ago

Yeah you probably need to check if the values of approxSSM(model)$H are very large. My initial though was to just check min(approxSSM(model)$H) but that doesn't seem to work as some of the variances at the beginning are still ok in your example. Not sure if the maximum is too strict, probably not.

So, I would probably make a custom objective function for optim, where you have something like

# create model
model <- ...
if(max(approxSSM(model)$H) > 1e100) {
  Inf
} else {
  -logLik
}

And yes you typically need to start with multiple initial values and pick the best, as the likelihood surface often contains multiple local optimums.

helske commented 3 years ago

I now added an additional parameter for logLik etc called H_tol, which can be used to automatically detect overly large values of H of the approximating model. If the largest variance of the model is bigger than the tolerance, logLik returns -.Machine$double.xmax^0.75. The updated version can be installed from the GitHub, and later this month from CRAN.

zjph602xtc commented 3 years ago

That works for me!! Thank you