tidymodels / TMwR

Code and content for "Tidy Modeling with R"
https://tmwr.org
Other
604 stars 285 forks source link

Possible typo/error in text #323

Closed woodspock closed 2 years ago

woodspock commented 2 years ago

In chapter 8.4.1, juste after Figure 8.1, text says :

Here we see that two neighborhoods have less than five properties in the training data (Landmark and Green Hills); in this case, no houses at all in the Landmark neighborhood were included in the training set.

if I'm not wrong, I think it should be :

_Here we see that two neighborhoods have less than five properties in the training data (Landmark and Green Hills); in this case, no houses at all in the Landmark neighborhood were included in the testing set.

juliasilge commented 2 years ago

Yep, you are right!

library(tidymodels)
data(ames)
ames <- mutate(ames, Sale_Price = log10(Sale_Price))

set.seed(502)
ames_split <- initial_split(ames, prop = 0.80, strata = Sale_Price)
ames_train <- training(ames_split)
ames_test  <-  testing(ames_split)

ames_train %>% count(Neighborhood) %>% filter(Neighborhood == "Landmark")
#> # A tibble: 1 × 2
#>   Neighborhood     n
#>   <fct>        <int>
#> 1 Landmark         1
ames_test %>% count(Neighborhood) %>% filter(Neighborhood == "Landmark")
#> # A tibble: 0 × 2
#> # … with 2 variables: Neighborhood <fct>, n <int>
#> # ℹ Use `colnames()` to see all variable names

Created on 2022-08-10 by the reprex package (v2.0.1)

I have submitted this as an errata at O'Reilly and will do a little PR here.

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.