stekhoven / missForest

missForest is a nonparametric, mixed-type imputation method for basically any type of data for the statistical software R.
http://stat.ethz.ch/CRAN/web/packages/missForest/index.html
87 stars 23 forks source link

Error: NA not permitted in predictors #26

Open HenrikEckermann opened 3 years ago

HenrikEckermann commented 3 years ago

Hi,

I am running into this error. I also found a solution to it that I did not find online yet when searching. Here is a reproducible example:

# attempt 1 with tibble does not work at all
d <- tibble(
  var = rnorm(n = 100),
  var2 = rbinom(n = 100, size = 100, p = 0.1),
  var3 = rnorm(n = 100),
  var4 = rnorm(n = 100),
)

d$var2 <- as.factor(d$var2)
dmiss <- prodNA(d)
dimprf <- missForest(
  dmiss, 
  variablewise = TRUE, 
  ntree = 1000, 
  decreasing = TRUE)

d <- tibble(
  var = rnorm(n = 100),
  var2 = rbinom(n = 100, size = 100, p = 0.1),
  var3 = rnorm(n = 100),
  var4 = rnorm(n = 100),
)

d$var2 <- as.factor(d$var2)
dmiss <- prodNA(d)
dimprf <- missForest(
  as.matrix(dmiss), 
  variablewise = TRUE, 
  ntree = 1000, 
  decreasing = TRUE)

# attempt 2 with matrix throughs error mentioned in title
d <- tibble(
  var = rnorm(n = 100),
  var2 = rbinom(n = 100, size = 100, p = 0.1),
  var3 = rnorm(n = 100),
  var4 = rnorm(n = 100),
)

# only converting to data.frame before works
d <- as.data.frame(d)
d$var2 <- as.factor(d$var2)
dmiss <- prodNA(d)
dimprf <- missForest(
  dmiss, 
  variablewise = TRUE, 
  ntree = 1000, 
  decreasing = TRUE)
stekhoven commented 2 years ago

missForest is not reliable with tibbles, we will look into that.

rempsyc commented 1 year ago

Thanks for opening this issue @HenrikEckermann. I was using data frames and everything was working fine. Then at some point in my script I started using a function that converted my data frame to a tibble, and missForest stopped working (Error in { : task 1 failed - "NA not permitted in predictors"). I could not figure out the source of this error and was totally puzzled, spent a lot of time trying to troubleshoot variables individually. So thanks for providing the workaround of converting to a data frame.

rempsyc commented 1 year ago

Google seems to bring people to #14, but they should really come here for the solution.