stekhoven / missForest

missForest is a nonparametric, mixed-type imputation method for basically any type of data for the statistical software R.
http://stat.ethz.ch/CRAN/web/packages/missForest/index.html
88 stars 23 forks source link

"NA not permitted in predictors" #14

Closed HedvigS closed 6 years ago

HedvigS commented 6 years ago

Hi,

Thanks for this package it's great. However, I'm having a puzzling problem. I'm imputing missing values for a binary categorical matrix. The values are represented as numbers (0,1), but should be interpreted as factors/characters. If I do the imputation as a numeric matrix, I of course get bizarre things like -6 as imputations for nas, even though for that variable there's only 0s and 1s for other observations. So, I change the class of the matrix to character or factor to make sure they're interpreted as such. And then I get this error message.

Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, : NA not permitted in predictors

If I continue as numeric everything runs fine.

I noticed that someone else had had this trouble, and that it seems to not have been resolved for them either. Seeing as I may not be the only one with this issue, I thought I'd request some help so that future people searching for this problem can get help, and of course so that I can get help too :).

stephematician commented 6 years ago

A minimum working example would help. I can reproduce the behaviour as follows:

# make up some binary factor data with missing values
Y <- prodNA(matrix(as.character(runif(100) > 0.5), nrow=10))
missForest(Y)

produces

Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry,  : 
  NA not permitted in predictors

Converting to a data.frame first runs without error:

missForest(data.frame(Y))
  missForest iteration 1 in progress...done!
  missForest iteration 2 in progress...done!
  missForest iteration 3 in progress...done!
HedvigS commented 6 years ago

Thank you. Sorry, I am not able to shared the data and couldn't make a corresponding working example, thank you for doing so, I really appreciate it.

I ran the example, and first even with data.frame() I got the same error message. Then I found the problem, it's the starting:

options(stringsAsFactors = FALSE)

That's what's causing the trouble. Phew. Alright, well this has been a ride. I was very frustrated with this, I'm glad it's solved.

While I was having this trouble, I tried reading all the documentation and every forum thread anywhere I could, and I couldn't find the answer or figure out what kind of input restrictions missForest has. It is perfectly possible that the answer is there in the documentation, and that I'm just not understanding it. Would it still be possible to add something about string and factors as input somewhere in either the package documentation or the "Using missForest" document? I think that would be helpful.

visamsundaram commented 5 years ago

Hi, I am having trouble with imputation for categorical values(they were originally characters, tried after converting them to factors and also tried as characters) Getting the error - length of response must be the same as predictors (when i tried to convert data into a matrix) Getting the error -Error in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, : NA not permitted in predictors (when i tried as data frame)

Tried with options(stringsAsFactors = FALSE) as well. Didnt work for categories! Any inputs on what has to done?

husain223 commented 4 years ago

Error in randomForest.default(feature_extraction(testdata), as.factor(testdata$survived), : NA not permitted in predictors

getting error in random forest R