`step_bagimpute` fails with nzv character columns #209

Open glenrs opened 5 years ago

glenrs commented 5 years ago

step_bagimpute crashes when a nzv character column is given.

The following example crashes when only one value is in the character column, but works if 2 values are present. Numeric columns are not affected.

d <- data.frame(let = c(rep("a", 99), NA), num = 1:100)
rec_obj <- d %>%
  recipes::recipe(formula = "~.") %>%

#> Error in cbind(yval2, yprob, nodeprob): number of rows of matrices must match (see arg 2)
topepo commented 5 years ago

The issue is that the variable has a single value so most models could be used. The best we can do is to throw a more meaningful error.

Using step_modeimpute would be a better choice.

glenrs commented 5 years ago

I agree that step_modeimpute would be a better choice in the circumstance above. A more meaningful error would be helpful. Thank you.

In wide datasets this could be a larger problem if one of the character columns only has one variable. It would be a pitty to simply use mode imputation for all nominal columns because of one feature.

library(healthcareai) ## This is included to provide pima_diabetes data
#> healthcareai version 2.2.0
#> Please visit https://docs.healthcare.ai for full documentation and vignettes. Join the community at https://healthcare-ai.slack.com

d <- data.frame(let = c(rep("a", 767), NA), num = 1:768, stringsAsFactors = FALSE)
d <- 
  d %>%

rec_obj <- 
  d %>%
  recipe(formula = "~.") %>%

#> Error in cbind(yval2, yprob, nodeprob): number of rows of matrices must match (see arg 2)

