nimble-dev / nimble

The base NIMBLE package for R
http://R-nimble.org
BSD 3-Clause "New" or "Revised" License
158 stars 24 forks source link

Can Laplace's getNodeNamesVec return mismatched length? #1426

Open perrydv opened 8 months ago

perrydv commented 8 months ago

We have a report from a user (without reproducible example) that the Laplace method $getNodeNamesVec can return a vector whose length does not match the length of the input parameter vector. I am noting this here to investigate.

paul-vdb commented 8 months ago

@perrydv I wonder if they used a Dirichlet or something that was transformed? A reproducible example would be helpful.

paul-vdb commented 8 months ago

@weizhangstats @perrydv @paciorek As a follow up here but not the same problem, have we implemented the MLE for when n_param != n_param_transformed? I made this example that works for a single call to the log likelihood, but it breaks when we try to find the MLE. This is something we need to worry about in INLA and I assume given the current result, probably something we need to worry about here too? This did not recreate any getNodeNamesVec errors.

model_code <- nimbleCode({
  # priors 
  intercept ~ dnorm(0, sd = 100)
  beta ~ dnorm(0, sd = 100)
  sigma ~ dunif(0, 10)
  # random effects and data  
  for(i in 1:10) {
    # random effects
    ran_eff[i] ~ dnorm(0, sd = sigma)
    for(j in 1:5) {
      # data
      y[i,j] ~ dpois(exp(intercept + beta*X[i,j] + ran_eff[i]))
    }
  }
  wgt[1:3] ~ ddirch(alpha[1:3])
  n[1:3] ~ dmultinom(size = 1000, prob = wgt[1:3])

})
set.seed(123)
X <- matrix(rnorm(50), nrow = 10)
model <- nimbleModel(model_code, constants = list(X = X), calculate = FALSE,
                     buildDerivs = TRUE) # Here is the argument needed for AD.
model$intercept <- 0
model$beta <- 0.2
model$sigma <- 0.5
model$wgt <- c(0.2,0.45,0.35)
model$alpha <- c(1,1,1)
model$calculate() # This will return NA because the model is not fully initialized.
model$simulate()
model$calculate() # This will return NA because the model is not fully initialized.
model$setData(c('y', 'n')) # Now the model has y marked as data, with values from simulation.

Cmodel <- compileNimble(model)
glmm_laplace <- buildLaplace(model)
Cglmm_laplace <- compileNimble(glmm_laplace, project = model)
Cglmm_laplace$getNodeNamesVec()

Cglmm_laplace$calcLaplace(c(0, 0, 1, 0.2, 0.4, 0.4))  ## Works?
Cglmm_laplace$calcLaplace(c(0, 0, 1, 0.15, 0.43, 0.42))  ## Works?

MLE <- Cglmm_laplace$findMLE(c(0, 0, 1, 0.15, 0.43, 0.42)) # Breaks...
weizhangstats commented 8 months ago

@paul-vdb Thanks for the example. I can run the code fine on my machine - the last line does not lead to crashes. The results do not look good though. I will take a deeper look into this.

perrydv commented 7 months ago

@paul-vdb @paciorek @danielturek I took at look at this and there are two issues, one solved and the other pointing to more investigation. (Also @paul-vdb, just making sure: in this example there are two entirely independent sub-models and the part with the multinomial data does not have any random effects. I think you were working on that and must have stumbled into this issue.)

  1. Please use branch fix-optim-crash. This fixes a potential problem in optim when the gradient function involves a parameter transformation that changes the length of the parameters. This problem would be consistent with intermittent and/or system-specific problems including crazy results or crashes.
  2. There seems to be a problem with the parameter transformation from sigma ~ dunif(0, 10) for optimization purposes. I think the transformation works correctly, but somehow we get screwy numerical results. @paciorek and I saw something similar in another problem recently. OTOH even individual Laplace evaluations (not even searching for the MLE) seem to give crazy results. When I change to sigma ~ dhalfflat(), invoking a simple log transformation for sigma, the example looks like it works reasonably.

Here is the working version:

model_code <- nimbleCode({
  # priors
  intercept ~ dnorm(0, sd = 100)
  beta ~ dnorm(0, sd = 100)
  sigma ~ dhalfflat() # dunif(0, 10)
  # random effects and data
  for(i in 1:10) {
    # random effects
    ran_eff[i] ~ dnorm(0, sd = sigma)
    for(j in 1:5) {
      # data
      y[i,j] ~ dpois(exp(intercept + beta*X[i,j] + ran_eff[i]))
    }
  }
  wgt[1:3] ~ ddirch(alpha[1:3])
  n[1:3] ~ dmultinom(size = 1000, prob = wgt[1:3])
})
set.seed(123)
X <- matrix(rnorm(50), nrow = 10)
model <- nimbleModel(model_code, constants = list(X = X), calculate = FALSE,
                     buildDerivs = TRUE) # Here is the argument needed for AD.
model$intercept <- 0
model$beta <- 0.2
model$sigma <- 0.5
model$wgt <- c(0.2,0.45,0.35)
model$alpha <- c(1,1,1)
model$calculate() # This will return NA because the model is not fully initialized.
model$simulate( model$getDependencies(c("intercept", "beta", "sigma", "wgt"), self=FALSE))
model$calculate() 
model$setData(c('y', 'n')) # Now the model has y marked as data, with values from simulation.

Cmodel <- compileNimble(model)
glmm_laplace <- buildLaplace(model)
Cglmm_laplace <- compileNimble(glmm_laplace, project = model)
Cglmm_laplace$getNodeNamesVec()

Cglmm_laplace$calcLaplace(c(0, 0, 1, 0.2, 0.4, 0.4))  
Cglmm_laplace$calcLaplace(c(0, 0, 1, 0.15, 0.43, 0.42))  

MLE <- Cglmm_laplace$findMLE(c(0, 0, 1, 0.15, 0.43, 0.42)) 
paul-vdb commented 7 months ago

@perrydv Yes you are correct. I was just forcing a dimension reduction with the parameter transformation in an otherwise simple model to see if it would potentially create any issues.

paciorek commented 6 months ago

@perrydv @paul-vdb @weizhangstats I am wondering how to close this issue given the awkwardness of having Laplace results depend on a prior that we don't otherwise use.

At the least I suppose we might mention in the user manual that we suggest not using uniform priors for variance/sd/precision parameters when using Laplace.

One step beyond that is that we emit a [Note] warning users they may want to use a different prior (one that has domain extending to Inf) in this situation. Though I'm not sure how systematically we can detect it.

paul-vdb commented 6 months ago

@paciorek In the updated Laplace code, we've added a findMAP option which will make use of the priors.

paciorek commented 5 months ago

Perry and I decided that for now I will just put a note in user manual and roxygen to be cautious about using priors with a bounded domain for variance component parameters.

paul-vdb commented 5 months ago

@perrydv @paciorek @weizhangstats Hey guys, going back to the original issue from this thread. I noticed in my current updates that one_time_fixes are not consistently applied depending on what functions you call first. If you call $getNodeNamesVec immediately after compiling, you get "EXTRA" or +1 node length.

This issue is resolved now with making sure one_time_fixes() gets called consistently. It had been commented out which results in "EXTRA" also being included in summary output if the user after compiling calls $findMLE and then $summary.

image

I think other parts of this thread are still valid though, and worth discussing.

weizhangstats commented 5 months ago

@paul-vdb @perrydv If there is only one param/random effects node, we expected one to use getNodeNameSingle(). But now I see the confusing point is that the user may not know how many nodes they have. Maybe we should add a call of one_time_fixes() inside getNodeNamesVec() as well to reduce this confusion.

paul-vdb commented 5 months ago

@weizhangstats I've done that.