mayer79 / missRanger

Fast multivariate imputation by random forests.
https://mayer79.github.io/missRanger/
GNU General Public License v2.0
63 stars 11 forks source link

Stack Overflow Error #50

Closed DarioS closed 1 year ago

DarioS commented 1 year ago

I have a gene expression matrix with only two missing values. it causes a stack overflow error. I did mean imputation instead.

> dim(RNAarrays)
  385 24174
> table(is.na(RNAarrays))
    FALSE    TRUE 
  9306988       2 
> RNAarrays <- missRanger(as.data.frame(RNAarrays))
  Missing value imputation by random forests
  Error: protect(): protection stack overflow
mayer79 commented 1 year ago

Thanks for the Info, I have not seen it before. Without data to reproduce the problem, I can't do anything.

DarioS commented 1 year ago

I have made an example to demonstrate it.

library(missRanger)
data <- matrix(rnorm(385 * 20000), nrow = 385, ncol = 20000)
data[5, 5] <- NA
data <- missRanger(as.data.frame(data))
mayer79 commented 1 year ago

Thanks! It fails due to the formula interface of ranger():

library(missRanger)
data <- as.data.frame(matrix(rnorm(385 * 20000), nrow = 385, ncol = 20000))
# data[5, 5] <- NA
fit <- ranger::ranger(V5 ~ ., data = data)              # Stack overflow
fit <- ranger::ranger(y = data[, 5], x = data[, -5])  # No problem

{missRanger} uses the formula interface, but seeing about problem, it would be an idea to change this in a future version and use the x/y interface.

The problem comes from ranger:::parse.formula(), which calls stats::terms() (which fails for too large input).

mayer79 commented 1 year ago

I think this solution could help. It allows R to parse larger formulas:

https://www.researchgate.net/post/error_protect_protection_stack_overflow_in_R

mayer79 commented 1 year ago

Fixed in https://github.com/mayer79/missRanger/pull/52