Open mdsteiner opened 2 years ago
Thank you for this report! 🙌 Overall in recipes we have some problems around how factors are handled such as #331, #715, and unfortunately others. We should plan to fix this problem that you reported together along with our overall factor problems.
Reproducible with only recipes, so I'm going to move it there:
library(tibble)
library(recipes)
# set up data
set.seed(42)
dat <- tibble(
criterion = rnorm(50),
num_pred_a = rnorm(50) + .8*criterion,
char_pred = ifelse(
criterion < .2,
sample(c("a", "b"), 1, prob = c(.75, .25)),
sample(c("a", "b"), 1, prob = c(.5, .5))
)
)
dat[sample(1:nrow(dat), 8), 2] <- NA
dat[sample(1:nrow(dat), 8), 3] <- NA
rec <- recipe(criterion ~ ., data = dat) %>%
step_impute_knn(all_predictors())
rec_prepped <- prep(rec, dat)
bake(rec_prepped, dat)
#> Error in gower_work(x = x, y = y, pair_x = pair_x, pair_y = pair_y, n = n, : Column 1 of x is of class character while matching column 1 of y is of class factor
Created on 2022-03-09 by the reprex package (v2.0.1)
I am running into this same issue but with step_other()
using an ordered
factor. I'm using step_other()
on an integer (age field) that previously was accepted by step_other(). It has now been cast as an ordered to keep up with changes.
@JosiahParry would you be able to produce a reprex? If this is true we might have a larger issue at hand
I solved this issue by placing step_impute_knn later in the sequence of steps. Therefore, I believe the correct approach should be to perform transformations first and then imputation
The problem
When imputing missing values with
step_impute_knn(all_predictors())
the errorError in gower_work(x = x, y = y, pair_x = pair_x, pair_y = pair_y, n = n, : Column 2 of x is of class character while matching column 2 of y is of class factor
is thrown when calling thepredict.workflows()
function. The recipe seems to be applied correctly in the fitting process, but not in the predict function. A workaround is to callstep_string2factor(all_nominal_predictors())
before thestep_impute_knn(all_predictors())
in the recipe but given that this is not necessary in the fitting process it may be desirable to have the same behavior when callingpredict.workflow()
.Reproducible example
Created on 2022-03-08 by the reprex package (v2.0.1)