Diagonal classifiers have error: cannot allocate vector of size 3.7 Gb

ramhiser / sparsediscrim

Sparse and Regularized Discriminant Analysis in R

Other

15 stars 5 forks source link

Diagonal classifiers have error: cannot allocate vector of size 3.7 Gb #31

Closed ramhiser closed 9 years ago

ramhiser commented 9 years ago

In some simulations for the HDRDA paper, the Tong and Pang classifiers had the following error:

Error : cannot allocate vector of size 3.7 Gb

The error occurred with the following data sets within the datamicroarray package:

burczynski
nakayama

ramhiser commented 9 years ago

The best I can tell, the issue above was not caused by the classifiers' implementations. Rather, it was the large classifier objects kept in memory during the simulation.

ramhiser commented 9 years ago

I was able to reproduce the error above on an c4.2xlarge EC2 instance with the following code:

library(mvtnorm)
library(datamicroarray)
data(nakayama)
p <- ncol(nakayama$x)
set.seed(42)
mean_k <- runif(p)
cov_k <- diag(mean_k)
z <- mvtnorm::dmvnorm(x=nakayama$x, mean=mean_k, sigma=cov_k)

Error: cannot allocate vector of size 3.7 Gb

The call to mvtnorm::dmvnorm occurs in sparsediscrim::posterior_probs. There's no need to use this function with a diagonal covariance matrix -- use an alternative approach instead.

ramhiser commented 9 years ago

Product of normal densities suffices under independence because normal distribution. Better yet, exponentiate the log sums. Either works. Here's some test code I used to write dmvnorm_diag.

library(mvtnorm)

x <- as.matrix(iris[, -5])
x_mean <- colMeans(x)
x_cov <- diag(diag(cov(x)))

density_mvt <- mvtnorm::dmvnorm(x, mean=x_mean, sigma=x_cov)

dmvnorm_diag <- function(x, mean, cov) {
  prod(dnorm(x, mean=mean, sd=sqrt(cov)))
}

density_diag <- apply(x, 1, function(row) dmvnorm_diag(row, mean=x_mean, cov=diag(x_cov)))

dmvnorm_diag2 <- function(x, mean, cov) {
  exp(sum(dnorm(x, mean=mean, sd=sqrt(cov), log=TRUE)))
}

density_diag2 <- apply(x, 1, function(row) dmvnorm_diag2(row, mean=x_mean, cov=diag(x_cov)))

summary(abs(density_diag - density_diag2))