jthaman / ciTools

An R Package for Quick Uncertainty Intervals
GNU General Public License v3.0
106 stars 9 forks source link

add_ci.lmer chokes on "big data" #32

Open jthaman opened 6 years ago

jthaman commented 6 years ago

I'm finding that we cannot use add_ci.lmer for "big data". I tried an example from the mermod vignette with 200,000 observations and found that R couldn't put the new data frame into memory. Here's the example I tried:

## linear example

x_gen_mermod <- function(ng = 8, nw = 5){
  n <- ng * nw
  x2 <- runif(n)
  group <- rep(as.character(1:ng), each = nw)
  return(tibble::tibble(x2 = x2,
                        group = group))
}

mm_pipe <- function(tb, ...){
  model.matrix(data = tb, ...)
}

get_validation_set <- function(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts){
  vm <- sample_n(tb, 5, replace = F)[rep(1:5, each = 100), ]
  vf <- bind_rows(vm, tb) %>%
    select(-group) %>%
    mm_pipe(~.*.)
  vf <- vf[1:500, ]
  vGroups <- if(!includeRanef) rnorm(500, 0, sigmaG) else groupIntercepts[as.numeric(vm$group)]
  vm[["y"]] <- vf %*% beta + vGroups + rnorm(500, mean = 0, sd = sigma)
  vm
}

y_gen_mermod <- function(tb, sigma = 1, sigmaG = 1, delta = 1, includeRanef = FALSE, validationPoints = FALSE){
  groupIntercepts <- rnorm(length(unique(tb$group)), 0, sigmaG)
  tf <- tb %>%
    dplyr::select(-group) %>%
    mm_pipe(~.*.)
  beta <- rep(delta, ncol(tf))
  if(validationPoints)  {
    vm <- get_validation_set(tb, sigma, sigmaG, beta, includeRanef, groupIntercepts)
  }
  tb[["y"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] + rnorm(nrow(tb), mean = 0, sd = sigma)
  tb[["truth"]] <- tf %*% beta + groupIntercepts[as.numeric(tb$group)] * (includeRanef)
  if(validationPoints) return(list(tb = tb, vm = vm)) else return(tb)
}

tb <- x_gen_mermod(10, 20000) %>%
    y_gen_mermod()

fit2 <- lmer(y ~ x2 + (1|group) , data = tb)

tb %>% add_ci(fit2, type = "parametric", includeRanef = TRUE, names = c("LCB", "UCB"))

Lmer works just fine on an example data set this large, but ciTools chokes and spits out

Error: cannot allocate vector of size 298.0 Gb

We need to re-examine how we are storing things in memory and see if we can do something more efficient. I'm not sure if this bug affects the other methods as well.