topepo / caret

caret (Classification And Regression Training) R package that contains misc functions for training and plotting classification and regression models
http://topepo.github.io/caret/index.html
1.61k stars 632 forks source link

caret can't handle NA roles in recipes #1293

Open EmilHvitfeldt opened 2 years ago

EmilHvitfeldt commented 2 years ago

If a recipe has NA roles caret errors out (uninformatively).

library(caret)
library(recipes)

data(okc, package = "modeldata")

rec_okc <- recipe(okc) %>%
  update_role(diet, height, new_role = "predictor") %>%
  update_role(Class, new_role = "outcome")

fit_okc <- train(rec_okc, 
                 data = okc, 
                 method = "glm",
                 family = "binomial",
                 trControl = trainControl(method = "repeatedcv", repeats = 1))
#> Error in if (any(is_weight)) {: missing value where TRUE/FALSE needed

This is happening because of cases like:

https://github.com/topepo/caret/blob/679eabaac7e54f4e87efa6c3bff75659cb457d8b/pkg/caret/R/train_recipes.R#L139-L140

Where is_weight has NAs in them.

A small internal helper function can properly be used to fix this error

library(caret)
library(recipes)
library(forcats)

data(okc, package = "modeldata")

rec_okc <- recipe(okc) %>%
  update_role(diet, height, new_role = "predictor") %>%
  update_role(Class, new_role = "outcome")

get_roles <- function(rec, role) {
  roles <- summary(rec)$role
  roles == role & !is.na(roles)
}

get_roles(rec_okc, "outcome")
#> [1] FALSE FALSE FALSE FALSE FALSE  TRUE
get_roles(rec_okc, "predictor")
#> [1] FALSE  TRUE  TRUE FALSE FALSE FALSE

Created on 2022-06-14 by the reprex package (v2.0.1)

EmilHvitfeldt commented 2 years ago

Also an issue here

https://github.com/topepo/caret/blob/679eabaac7e54f4e87efa6c3bff75659cb457d8b/pkg/caret/R/train.default.R#L1150-L1155