Closed dedzo closed 4 years ago
This should be fixed now:
library(tidyverse)
library(tidymodels)
#> ── Attaching packages ───────────────────────────────── tidymodels 0.1.1.9000 ──
#> ✓ broom 0.7.0 ✓ recipes 0.1.15
#> ✓ dials 0.0.9.9000 ✓ rsample 0.0.8
#> ✓ infer 0.5.3 ✓ tune 0.1.1.9000
#> ✓ modeldata 0.1.0 ✓ workflows 0.1.3
#> ✓ parsnip 0.1.4 ✓ yardstick 0.0.7
#> ── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
#> x scales::discard() masks purrr::discard()
#> x dplyr::filter() masks stats::filter()
#> x recipes::fixed() masks stringr::fixed()
#> x dplyr::lag() masks stats::lag()
#> x yardstick::spec() masks readr::spec()
#> x recipes::step() masks stats::step()
library(themis)
#> Registered S3 methods overwritten by 'themis':
#> method from
#> bake.step_downsample recipes
#> bake.step_upsample recipes
#> prep.step_downsample recipes
#> prep.step_upsample recipes
#> tidy.step_downsample recipes
#> tidy.step_upsample recipes
#> tunable.step_downsample recipes
#> tunable.step_upsample recipes
#>
#> Attaching package: 'themis'
#> The following objects are masked from 'package:tune':
#>
#> required_pkgs, tunable
#> The following objects are masked from 'package:recipes':
#>
#> step_downsample, step_upsample
data("okc")
new_data<-okc
new_data$Class<-as.character(new_data$Class)
new_data$Class[1]<-"dummy"
new_data$Class<-as.factor(new_data$Class)
c<-recipe(Class ~ ., data =new_data) %>%
update_role(date, new_role = 'date')%>%
update_role(diet, new_role = 'diet')%>%
update_role(location, new_role='location')%>%
step_unknown(diet, new_level = 'unknown')%>%
step_meanimpute(all_predictors()) %>%
step_smote(Class) %>%
prep()%>%
juice()%>%
view()
#> Error: Not enough observations of 'dummy' to perform SMOTE.
count(new_data, Class)
#> # A tibble: 3 x 2
#> Class n
#> <fct> <int>
#> 1 dummy 1
#> 2 other 50315
#> 3 stem 9539
Created on 2020-11-11 by the reprex package (v0.3.0)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Running
step_smote
invokes a (correct) error in theRANN
package if there are classes in the data with fewer observations than theneighbors
parameter; this can occur with small data sets, or modest ones when performing cross validation.It would be helpful to catch this in a test internally to
themis
so that the error message is easier to debug (theRANN
error refers to different variable names, and requires the user to have a better knowledge of SMOTE to debug); eg.Error in themis::step_smote: neighbors must be below the smallest class size
The reprex below demonstrates the current behaviour: