stekhoven / missForest

missForest is a nonparametric, mixed-type imputation method for basically any type of data for the statistical software R.
http://stat.ethz.ch/CRAN/web/packages/missForest/index.html
91 stars 24 forks source link

Error if data contains character variables #7

Open pablo14 opened 7 years ago

pablo14 commented 7 years ago

Hi! Nice package! I found a bug when data frame contains character column (instead of factor)

Example:

library(Lock5Data)
library(dplyr)
data("HollywoodMovies2011")
# Movie is the ID column.
imputationResults <- missForest(xmis = select(HollywoodMovies2011, -Movie))

It throws the error: argument is not numeric or logical: returning NA missForest iteration 1 in progress... NAs introduced by coercionError in randomForest.default(x = obsX, y = obsY, ntree = ntree, mtry = mtry, : NA/NaN/Inf in foreign function call (arg 1)

By intution I checked the data types, and convert the only character variable into factor, and now it doesn't crash. This works ok:

HollywoodMovies2011_copy=HollywoodMovies2011
HollywoodMovies2011_copy$TheatersOpenWeek_2=as.factor(HollywoodMovies2011_copy$TheatersOpenWeek_2)
imputationResults <- missForest(xmis = select(HollywoodMovies2011_copy, -Movie))

cheers :)