Closed jaredlander closed 4 years ago
Quick note: I think this might have to do with the fact that step_zv didn't remove any predictors. The tidy method worked for those of us who used step_nzv
, where there were some predictors removed.
That's a solid point. It should give a message saying nothing was removed rather than giving an error.
I had the error during the workshop, but something else seems to have fixed it. Can either of you reproduce it and do a reprex::reprex(si = TRUE)
?
library(AmesHousing)
library(tidymodels)
#> Registered S3 method overwritten by 'xts':
#> method from
#> as.zoo.xts zoo
#> ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────── tidymodels 0.0.3 ──
#> ✓ broom 0.5.2 ✓ purrr 0.3.3
#> ✓ dials 0.0.4 ✓ recipes 0.1.7
#> ✓ dplyr 0.8.3 ✓ rsample 0.0.5
#> ✓ ggplot2 3.2.1.9000 ✓ tibble 2.99.99.9010
#> ✓ infer 0.5.1 ✓ yardstick 0.0.4
#> ✓ parsnip 0.0.4.9000
#> ── Conflicts ────────────────────────────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step() masks stats::step()
#> x recipes::yj_trans() masks scales::yj_trans()
ames <- make_ames() %>%
dplyr::select(-matches('Qu'))
set.seed(4595)
data_split <- initial_split(data=ames, prop=0.75, strata='Sale_Price')
data_split
#> <2199/731/2930>
ames_train <- training(data_split)
ames_test <- testing(data_split)
mod_rec_zv <- recipe(
Sale_Price ~ Longitude + Latitude + Neighborhood,
data = ames_train
) %>%
step_log(Sale_Price, base = 10) %>%
step_dummy(all_nominal()) %>%
step_zv(everything()) %>%
prep()
tidy(mod_rec_zv, number=3)
#> # A tibble: 0 x 2
#> # … with 2 variables: terms <chr>, id <chr>
Created on 2019-12-14 by the reprex package (v0.3.0)
Yes, I ran it and still got an error.
library(AmesHousing)
library(tidymodels)
#> -- Attaching packages --------------------------------------------------------------------------------------------------- tidymodels 0.0.3 --
#> v broom 0.5.2 v purrr 0.3.3
#> v dials 0.0.3.9002 v recipes 0.1.7.9002
#> v dplyr 0.8.3 v rsample 0.0.5
#> v ggplot2 3.2.1 v tibble 2.1.3
#> v infer 0.5.0 v yardstick 0.0.4
#> v parsnip 0.0.4.9000
#> -- Conflicts ------------------------------------------------------------------------------------------------------ tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x dials::offset() masks stats::offset()
#> x recipes::step() masks stats::step()
ames <- make_ames() %>%
dplyr::select(-matches('Qu'))
set.seed(4595)
data_split <- initial_split(data=ames, prop=0.75, strata='Sale_Price')
data_split
#> <2199/731/2930>
ames_train <- training(data_split)
ames_test <- testing(data_split)
mod_rec_zv <- recipe(
Sale_Price ~ Longitude + Latitude + Neighborhood,
data = ames_train
) %>%
step_log(Sale_Price, base = 10) %>%
step_dummy(all_nominal()) %>%
step_zv(everything()) %>%
prep()
tidy(mod_rec_zv, number=3)
#> Error in `$<-.data.frame`(`*tmp*`, "id", value = "zv_Q8mKv"): replacement has 1 row, data has 0
Created on 2019-12-16 by the reprex package (v0.3.0)
Can you run devtools::install_dev("tibble")
and try again?
Yes! That does the trick. vctrs
and cli
also updated during the tibble
installation, if that matters.
library(AmesHousing)
library(tidymodels)
#> -- Attaching packages --------------------------------------------------------------------------------------------------- tidymodels 0.0.3 --
#> v broom 0.5.2 v purrr 0.3.3
#> v dials 0.0.3.9002 v recipes 0.1.7.9002
#> v dplyr 0.8.3 v rsample 0.0.5
#> v ggplot2 3.2.1 v tibble 2.99.99.9010
#> v infer 0.5.0 v yardstick 0.0.4
#> v parsnip 0.0.4.9000
#> -- Conflicts ------------------------------------------------------------------------------------------------------ tidymodels_conflicts() --
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x dials::offset() masks stats::offset()
#> x recipes::step() masks stats::step()
ames <- make_ames() %>%
dplyr::select(-matches('Qu'))
set.seed(4595)
data_split <- initial_split(data=ames, prop=0.75, strata='Sale_Price')
data_split
#> <2199/731/2930>
ames_train <- training(data_split)
ames_test <- testing(data_split)
mod_rec_zv <- recipe(
Sale_Price ~ Longitude + Latitude + Neighborhood,
data = ames_train
) %>%
step_log(Sale_Price, base = 10) %>%
step_dummy(all_nominal()) %>%
step_zv(everything()) %>%
prep()
tidy(mod_rec_zv, number=3)
#> # A tibble: 0 x 2
#> # ... with 2 variables: terms <chr>, id <chr>
Created on 2019-12-17 by the reprex package (v0.3.0)
I installed the CRAN versions of tibble
, vctrs
, and rlang
and the error still occurs with the devel version of recipes
.
I'm going to leave recipes
as-is for the time being and will verify that the new release of these (probably tibble
) solves the issue. We need to submit recipes
this week by CRAN mandate.
It looks like tibble
solved it:
library(AmesHousing)
library(tidymodels)
#> ── Attaching packages ───────────────────────────────────────────────────────────── tidymodels 0.1.0 ──
#> ✓ broom 0.5.4 ✓ recipes 0.1.12
#> ✓ dials 0.0.6 ✓ rsample 0.0.6
#> ✓ dplyr 0.8.5 ✓ tibble 3.0.1
#> ✓ ggplot2 3.3.0 ✓ tune 0.1.0
#> ✓ infer 0.5.1 ✓ workflows 0.1.0
#> ✓ parsnip 0.1.0.9001 ✓ yardstick 0.0.5
#> ✓ purrr 0.3.4
#> Warning: package 'rsample' was built under R version 3.6.2
#> Warning: package 'tibble' was built under R version 3.6.2
#> ── Conflicts ──────────────────────────────────────────────────────────────── tidymodels_conflicts() ──
#> x purrr::discard() masks scales::discard()
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
#> x ggplot2::margin() masks dials::margin()
#> x recipes::step() masks stats::step()
ames <- make_ames() %>%
select(-matches('Qu'))
set.seed(4595)
data_split <- initial_split(data=ames, prop=0.75, strata='Sale_Price')
data_split
#> <Training/Validation/Total>
#> <2199/731/2930>
ames_train <- training(data_split)
ames_test <- testing(data_split)
mod_rec_zv <- recipe(
Sale_Price ~ Longitude + Latitude + Neighborhood,
data = ames_train
) %>%
step_log(Sale_Price, base = 10) %>%
# Lump factor levels that occur in
# <= 5% of data as "other"
# step_other(Neighborhood, threshold = 0.05) %>%
# Create dummy variables for _any_ factor variables
step_dummy(all_nominal()) %>%
step_zv(everything()) %>%
prep()
tidy(mod_rec_zv, number=3)
#> # A tibble: 0 x 2
#> # … with 2 variables: terms <chr>, id <chr>
Created on 2020-05-01 by the reprex package (v0.3.0)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.
From @topepo's AML workshop, when calling
tidy()
on the result ofstep_zv()
you get the following error.tidy()
works on other steps.Code to reproduce below: