Closed ramhiser closed 9 years ago
The best I can tell, the issue above was not caused by the classifiers' implementations. Rather, it was the large classifier objects kept in memory during the simulation.
I was able to reproduce the error above on an c4.2xlarge
EC2 instance with the following code:
library(mvtnorm)
library(datamicroarray)
data(nakayama)
p <- ncol(nakayama$x)
set.seed(42)
mean_k <- runif(p)
cov_k <- diag(mean_k)
z <- mvtnorm::dmvnorm(x=nakayama$x, mean=mean_k, sigma=cov_k)
Error: cannot allocate vector of size 3.7 Gb
The call to mvtnorm::dmvnorm
occurs in sparsediscrim::posterior_probs
. There's no need to use this function with a diagonal covariance matrix -- use an alternative approach instead.
Product of normal densities suffices under independence because normal distribution. Better yet, exponentiate the log sums. Either works. Here's some test code I used to write dmvnorm_diag
.
library(mvtnorm)
x <- as.matrix(iris[, -5])
x_mean <- colMeans(x)
x_cov <- diag(diag(cov(x)))
density_mvt <- mvtnorm::dmvnorm(x, mean=x_mean, sigma=x_cov)
dmvnorm_diag <- function(x, mean, cov) {
prod(dnorm(x, mean=mean, sd=sqrt(cov)))
}
density_diag <- apply(x, 1, function(row) dmvnorm_diag(row, mean=x_mean, cov=diag(x_cov)))
dmvnorm_diag2 <- function(x, mean, cov) {
exp(sum(dnorm(x, mean=mean, sd=sqrt(cov), log=TRUE)))
}
density_diag2 <- apply(x, 1, function(row) dmvnorm_diag2(row, mean=x_mean, cov=diag(x_cov)))
summary(abs(density_diag - density_diag2))
In some simulations for the HDRDA paper, the Tong and Pang classifiers had the following error:
The error occurred with the following data sets within the
datamicroarray
package:burczynski
nakayama