Open jmpanfil opened 3 years ago
Hm... That looks like the sparse matrix is converted to dense, which shouldn't happen. Could you try to simulate similar data to create a reproducible example?
Sure, here's an example. Heads up this maxes out at a lot of RAM while making the sparse matrix (ends up being 17.6 GB but gets a lot higher than that while running I think). You can probably get the same result with fewer rows but I have it at 10 million by 400 columns.
library(Matrix)
library(ranger)
nr <- 10e6
nc <- 400
set.seed(23)
sp <- sparseMatrix(i = sample(1:nr, nr*nc / 2, replace = TRUE),
j = sample(1:nc, nr*nc / 2, replace = TRUE),
x = ifelse(runif(nr*nc / 2) < .5, 0, 1))
y <- sample(c(0,1), nr, replace = TRUE)
sp <- cbind(sp, y)
colnames(sp) <- c(paste0('x', 1:nc), 'y')
model <- ranger(data = sp,
dependent.variable.name = "y",
num.trees = 5,
keep.inbag = TRUE,
splitrule = "extratrees",
quantreg = FALSE,
verbose = TRUE,
importance = 'impurity',
probability = TRUE)
Error in as.vector(.Call(Csparse_to_vector, x), mode) :
Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
This seems to only happen if splitrule = "extratrees"
. I'll try to find the problem.
I am running into an error using a sparse matrix.
model <- ranger(data = x, dependent.variable.name = "y", keep.inbag = TRUE, splitrule = "extratrees", quantreg = FALSE, verbose = TRUE, importance = 'impurity', probability = TRUE)
I can't share my data directly but x is a
dgCMatrix
from theMatrix
package with dimensions of (6838778, 354) with 305,025,741 non-zero elements.I get the error
Error in as.vector(.Call(Csparse_to_vector, x), mode) : Cholmod error 'problem too large' at file ../Core/cholmod_dense.c, line 105
. I am running R 4.0.3 with ranger_0.12.1. I have used this exact sparse matrix (other than the y column) with XGBoost without issues.Is there anything I am doing wrong?