Closed woodspock closed 2 years ago
Yep, you are right!
library(tidymodels)
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))
set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test <- testing(ames_split)
ames_train %>% count(Neighborhood) %>% filter(Neighborhood == "Landmark")
#> # A tibble: 1 × 2
#> Neighborhood n
#> <fct> <int>
#> 1 Landmark 1
ames_test %>% count(Neighborhood) %>% filter(Neighborhood == "Landmark")
#> # A tibble: 0 × 2
#> # … with 2 variables: Neighborhood <fct>, n <int>
#> # ℹ Use `colnames()` to see all variable names
Created on 2022-08-10 by the reprex package (v2.0.1)
I have submitted this as an errata at O'Reilly and will do a little PR here.
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
In chapter 8.4.1, juste after Figure 8.1, text says :
Here we see that two neighborhoods have less than five properties in the training data (Landmark and Green Hills); in this case, no houses at all in the Landmark neighborhood were included in the training set.
if I'm not wrong, I think it should be :
_Here we see that two neighborhoods have less than five properties in the training data (Landmark and Green Hills); in this case, no houses at all in the Landmark neighborhood were included in the testing set.