Closed jaredlander closed 3 years ago
I think that this works if you install the current dev version of Cubist
. It was an issue with how lappy()
works with tibbles.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(C50)
library(rsample)
library(recipes)
#>
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#>
#> step
data(credit_data,package='modeldata')
credit <- tibble::as_tibble(credit_data) %>% mutate(across(where(is.factor), as.character))
set.seed(28676)
data_split <- initial_split(credit, prop=.9, strata='Status')
train <- training(data_split)
test <- testing(data_split)
rec_C50 <- recipe(Status ~ ., data=train) %>%
themis::step_upsample(Status) %>%
step_other(all_nominal(), -Status, other='misc')
#> Warning: replacing previous import 'data.table:::=' by 'ggplot2:::=' when
#> loading 'mlr'
#> Registered S3 methods overwritten by 'themis':
#> method from
#> bake.step_downsample recipes
#> bake.step_upsample recipes
#> prep.step_downsample recipes
#> prep.step_upsample recipes
#> tidy.step_downsample recipes
#> tidy.step_upsample recipes
#> tunable.step_downsample recipes
#> tunable.step_upsample recipes
prep_c50 <- rec_C50 %>% prep()
train_data <- prep_c50 %>% juice()
test_data <- prep_c50 %>% bake(new_data=test)
# this works as expected
c5_formula <- C5.0(Status ~ ., data=prep_c50 %>% juice())
preds_formula <- predict(c5_formula, newdata=prep_c50 %>% bake(new_data=test, all_predictors()))
head(preds_formula)
#> [1] good good good good bad good
#> Levels: bad good
# this causes an error
c5_xy <- C5.0(x=prep_c50 %>% juice(all_predictors()), y=prep_c50 %>% juice(Status) %>% pull(Status))
preds_xy <- predict(c5_xy, newdata=prep_c50 %>% bake(new_data=test, all_predictors()))
Created on 2021-05-06 by the reprex package (v1.0.0.9000)
When using the
formula
interface forC5.0()
everything works as expected. But when using thex
andy
arguments predictions do not work. A Stackoverflow question from earlier this year came to the same conclusion.Here is some code to illustrate.
Interestingly, when fitting using
{workflows}
, predictions work for an untunedboost_tree()
model and for a tuned or untuneddecision_tree()
model. But this error occurs when trying to tune aboost_tree()
model.To make matters worse the
{C5.0}
website shows this error in the documentation for thepredict()
function as seen in the image below.