tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
103 stars 17 forks source link

unsupervised methods #95

Closed topepo closed 4 years ago

topepo commented 5 years ago

Please provide some guidance on using hardhat for models that do not include a specific outcome. Right now, we can't use NULL with mold():

library(hardhat)
mold(iris[, "Sepal.Width", drop = FALSE], y = NULL)
#> `y` is of unknown type 'NULL'.
mold(iris[, "Sepal.Width", drop = FALSE], y = NA_real_)
#> $predictors
#> # A tibble: 150 x 1
#>    Sepal.Width
#>          <dbl>
#>  1         3.5
#>  2         3  
#>  3         3.2
#>  4         3.1
#>  5         3.6
#>  6         3.9
#>  7         3.4
#>  8         3.4
#>  9         2.9
#> 10         3.1
#> # … with 140 more rows
#> 
#> $outcomes
#> # A tibble: 1 x 1
#>   .outcome
#>      <dbl>
#> 1       NA
#> 
#> $blueprint
#> XY blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 1 
#>    Intercept: FALSE 
#> 
#> $extras
#> NULL

Created on 2019-07-08 by the reprex package (v0.2.1)

DavisVaughan commented 5 years ago

Adding a note that you can do this with the formula method:

hardhat::mold(~Sepal.Length, iris)
#> $predictors
#> # A tibble: 150 x 1
#>    Sepal.Length
#>           <dbl>
#>  1          5.1
#>  2          4.9
#>  3          4.7
#>  4          4.6
#>  5          5  
#>  6          5.4
#>  7          4.6
#>  8          5  
#>  9          4.4
#> 10          4.9
#> # … with 140 more rows
#> 
#> $outcomes
#> # A tibble: 150 x 0
#> 
#> $blueprint
#> Formula blueprint: 
#>  
#> # Predictors: 1 
#>   # Outcomes: 0 
#>    Intercept: FALSE 
#>   Indicators: TRUE 
#> 
#> $extras
#> $extras$offset
#> NULL

Teaching standardize() about NULL inputs seems like it might be enough for this to work for XY methods. Not sure about recipes.

It worth be worth it to decide on what the object that is placed in the $outcomes slot should be, and then apply that consistently for all methods. As you can see with the formula method it places a 0 column tibble there to be type stable, but I sort of like the idea of it being NULL because that is a clear indicator of a "lack of outcome".

topepo commented 5 years ago

We will also need to modify the mold() code:

library(hardhat)
library(recipes)
#> Loading required package: dplyr
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
#> 
#> Attaching package: 'recipes'
#> The following object is masked from 'package:stats':
#> 
#>     step

rec <- recipe(~ ., data = iris) %>% 
  step_log(Sepal.Width)

hardhat::mold(rec, iris)
#> No variables or terms were selected.

Created on 2019-07-17 by the reprex package (v0.2.1)

DavisVaughan commented 5 years ago

Oh now I remember why I have this issue open in recipes!! https://github.com/tidymodels/recipes/issues/298

github-actions[bot] commented 3 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.