tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
101 stars 15 forks source link

Use `update_blueprint0()` internally for performance #235

Closed DavisVaughan closed 1 year ago

DavisVaughan commented 1 year ago

Because the check_*() functions have a decent amount of overhead when called a lot, and we don't really need it for this internal usage

library(hardhat)
library(tibble)

formula <- y ~ a + b

df1 <- data.frame(y = 1, a = 2, b = 3)
df2 <- tibble(y = 1, a = 2, b = 3)

# Main
bench::mark(mold(formula, df1))
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 mold(formula, df1)   2.01ms   2.52ms      405.     1.3MB     12.7

# This PR
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 mold(formula, df1)   1.58ms   1.96ms      505.    1.74MB     10.5

bench::mark(mold(formula, df2))
# Main
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 mold(formula, df2)   2.05ms   2.57ms      384.    2.77KB     13.0

# This PR
#> # A tibble: 1 × 6
#>   expression              min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr>         <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 mold(formula, df2)    1.9ms   2.09ms      470.    2.77KB     10.7

bench::mark(mold(df1[c("a", "b")], df1["y"]))
# Main
#> # A tibble: 1 × 6
#>   expression                            min   median `itr/sec` mem_alloc gc/se…¹
#>   <bch:expr>                       <bch:tm> <bch:tm>     <dbl> <bch:byt>   <dbl>
#> 1 mold(df1[c("a", "b")], df1["y"])    219µs    267µs     3786.     118KB    14.6
#> # … with abbreviated variable name ¹​`gc/sec`

# This PR
#> # A tibble: 1 × 6
#>   expression                            min   median `itr/sec` mem_alloc gc/se…¹
#>   <bch:expr>                       <bch:tm> <bch:tm>     <dbl> <bch:byt>   <dbl>
#> 1 mold(df1[c("a", "b")], df1["y"])    199µs    214µs     4552.     218KB    12.4
#> # … with abbreviated variable name ¹​`gc/sec`

bench::mark(mold(df2[c("a", "b")], df2["y"]))
# Main
#> # A tibble: 1 × 6
#>   expression                            min   median `itr/sec` mem_alloc gc/se…¹
#>   <bch:expr>                       <bch:tm> <bch:tm>     <dbl> <bch:byt>   <dbl>
#> 1 mold(df2[c("a", "b")], df2["y"])    244µs    302µs     3373.        0B    12.4
#> # … with abbreviated variable name ¹​`gc/sec`

# This PR
#> # A tibble: 1 × 6
#>   expression                            min   median `itr/sec` mem_alloc gc/se…¹
#>   <bch:expr>                       <bch:tm> <bch:tm>     <dbl> <bch:byt>   <dbl>
#> 1 mold(df2[c("a", "b")], df2["y"])    181µs    231µs     4296.        0B    12.4
#> # … with abbreviated variable name ¹​`gc/sec`

Created on 2023-03-27 with reprex v2.0.2.9000

github-actions[bot] commented 1 year ago

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.