suyusung / arm

Data Analysis Using Regression and Multilevel/Hierarchical Models
24 stars 6 forks source link

arm::sim speed #7

Closed andremrsantos closed 6 years ago

andremrsantos commented 7 years ago

Hello arm staff,

I've being doing some analysis using arm::bayesglm and to predict confidence interval I use arm::sim for 1E5 simulations but it was running too slow. Checking the source code, I found in the function for sim.glm the following for loop.

for (s in 1:n.sims){
      beta[s,] <- MASS::mvrnorm (1, beta.hat, V.beta)
 }

I don't understand why do you choose to run as a for loop instead of:

beata <- MASS::mvrnorm(n.sims, beta.hat, V.beta)

I've run some tests and while the second one is almost instantaneous, the first take several seconds.

> system.time(x <- MASS::mvrnorm (n, beta.hat, V.beta))
  usuário   sistema decorrido 
    0.184     0.038     0.204 
> system.time(for (i in 1:n) y[i,] <-  MASS::mvrnorm (1, beta.hat, V.beta))
  usuário   sistema decorrido 
   27.824     0.681    29.698 
> system.time(replicate(n, MASS::mvrnorm (1, beta.hat, V.beta)))
  usuário   sistema decorrido 
   29.766     0.510    31.146 

Is there a statistical reason for the choice? I would love to understand it. If soo, would you consider a blocked alternative, such as:

blocks <- 100
blocksize <- floor(n.sims/blocks)
for (s in 1:blocks){
    from <- (s-1)*blocksize
    to <- pmin(s*blocksize, n.sims)
    beta[from:to,] <- MASS::mvrnorm (pmin(blocksize, from-n.sims), beta.hat, V.beta)
}

Regards

suyusung commented 6 years ago

Hi sorry for late late response. I have adopted your suggestion and updated arm. But I am not sure about the blocked way. So I did not use it.