tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
111 stars 26 forks source link

Expand documentation for custom transforms #238

Closed mikemahoney218 closed 2 years ago

mikemahoney218 commented 2 years ago

The problem

It seems like creating a grid with custom transforms (via scales::trans_new) sometimes results in NaN outputs.

Reproducible example

library(dials)
#> Loading required package: scales

trans_raise <- trans_new("raise", \(x) 2^x, \(x) -log2(x))

# trans_raise appears to work fine:
trans_raise$transform(-15)
#> [1] 3.051758e-05
trans_raise$inverse(trans_raise$transform(-15))
#> [1] 15

cost() |> grid_random()
#> # A tibble: 5 × 1
#>     cost
#>    <dbl>
#> 1 0.0509
#> 2 0.0114
#> 3 0.0440
#> 4 0.0133
#> 5 0.545

cost(c(-12, 15), trans = trans_raise) |> grid_random() 
#> Warning in object$trans$inverse(x): NaNs produced
#> # A tibble: 5 × 1
#>      cost
#>     <dbl>
#> 1 NaN    
#> 2  -3.26 
#> 3  -2.78 
#> 4  -2.65 
#> 5   0.755

cost(c(-12, 15), trans = scales::log2_trans()) |> grid_random() 
#> # A tibble: 5 × 1
#>        cost
#>       <dbl>
#> 1 132.     
#> 2   0.00198
#> 3   8.56   
#> 4 129.     
#> 5   0.461

Created on 2022-06-07 by the reprex package (v2.0.1)

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.2.0 (2022-04-22) #> os Ubuntu 20.04.4 LTS #> system x86_64, linux-gnu #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2022-06-07 #> pandoc 2.17.1.1 @ /usr/lib/rstudio/bin/quarto/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0) #> cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0) #> colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0) #> crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0) #> DBI 1.1.2 2021-12-20 [1] CRAN (R 4.2.0) #> dials * 0.1.1 2022-04-06 [1] CRAN (R 4.2.0) #> DiceDesign 1.9 2021-02-13 [1] CRAN (R 4.2.0) #> digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0) #> dplyr 1.0.9 2022-04-28 [1] CRAN (R 4.2.0) #> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0) #> evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0) #> fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0) #> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0) #> fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0) #> generics 0.1.2 2022-01-31 [1] CRAN (R 4.2.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.2.0) #> hardhat 0.2.0 2022-01-24 [1] CRAN (R 4.2.0) #> highr 0.9 2021-04-16 [1] CRAN (R 4.2.0) #> htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0) #> knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0) #> lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0) #> munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0) #> pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0) #> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0) #> R.cache 0.15.0 2021-04-30 [1] CRAN (R 4.2.0) #> R.methodsS3 1.8.1 2020-08-26 [1] CRAN (R 4.2.0) #> R.oo 1.24.0 2020-08-26 [1] CRAN (R 4.2.0) #> R.utils 2.11.0 2021-09-26 [1] CRAN (R 4.2.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0) #> reprex 2.0.1 2021-08-05 [1] CRAN (R 4.2.0) #> rlang 1.0.2.9002 2022-06-07 [1] Github (r-lib/rlang@a627703) #> rmarkdown 2.14 2022-04-25 [1] CRAN (R 4.2.0) #> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0) #> scales * 1.2.0 2022-04-13 [1] CRAN (R 4.2.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0) #> stringi 1.7.6 2021-11-29 [1] CRAN (R 4.2.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0) #> styler 1.7.0 2022-03-13 [1] CRAN (R 4.2.0) #> tibble 3.1.7 2022-05-03 [1] CRAN (R 4.2.0) #> tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0) #> utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0) #> vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0) #> xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0) #> yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0) #> #> [1] /home/mikemahoney218/R/x86_64-pc-linux-gnu-library/4.2 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
EmilHvitfeldt commented 2 years ago

I think you got the transformations mixed up. The trans argument is applied inverse-first such that you get the values in the range by applying the calculations.

As an example; the grid you get from grid_random() when using log2_trans() will be randomly distributed between -12 and 15 once log2() has been applied.

library(dials)
#> Loading required package: scales
set.seed(1234)

res <- cost(c(-12, 15), trans = log2_trans()) |> 
  grid_random(size = 100)

res$cost |> log2() |> sort()
#>   [1] -11.7436146 -11.6287516 -11.6050641 -11.2786438 -10.9201102 -10.8698536
#>   [7] -10.7642029 -10.6055413 -10.1085826 -10.0079432  -9.5710440  -9.1572374
#>  [13]  -8.9300079  -8.7200349  -8.4464184  -8.3798887  -8.1493857  -7.8690281
#>  [19]  -7.7057579  -7.6792261  -7.5073679  -7.2844548  -7.1104024  -6.9584847
#>  [25]  -6.7032844  -6.5663030  -6.0924124  -5.7299004  -5.7211363  -5.5463054
#>  [31]  -5.4139217  -5.0121349  -4.8399599  -4.7958389  -4.3661932  -4.2719713
#>  [37]  -4.1074723  -3.8946255  -3.8272790  -3.7738505  -3.6814416  -3.6395194
#>  [43]  -3.6384617  -3.6064889  -3.5807437  -3.4514637  -3.4360818  -3.3426006
#>  [49]  -3.0962053  -1.6247994  -1.4835058   0.3144700   0.7415625   1.0947635
#>  [55]   1.3369449   1.5539318   1.6062042   1.6227396   1.6972855   1.7307843
#>  [61]   1.8127567   1.8847808   2.0181295   2.1938338   2.6952834   2.7143206
#>  [67]   2.9400069   3.2433857   4.4504178   4.7891183   4.8020839   4.8312449
#>  [73]   5.2883863   5.4529645   5.9842615   6.0493757   6.2815522   6.7269649
#>  [79]   7.0753547   7.3663371   8.0372307   8.2824053   8.5111072   8.6674143
#>  [85]   9.1442698   9.7985132   9.8861609  10.4463163  10.6069820  10.9065951
#>  [91]  11.2447154  11.3505135  12.1065823  12.1970179  12.2616732  12.3114641
#>  [97]  12.6957705  12.9327041  13.0128129  14.7880613

Swapping your transformations makes this work

library(dials)
#> Loading required package: scales
set.seed(1234)

trans_raise <- trans_new("raise", \(x) -log2(x), \(x) 2^x)

res <- cost(c(-12, 15), trans = trans_raise) |> 
  grid_random(size = 100)

res$cost |> log2() |> sort()
#>   [1] -11.4004158 -11.0426684 -10.9566484 -10.0366655  -9.9457503  -9.7817322
#>   [7]  -9.7054121  -9.2792359  -8.8643341  -8.7109204  -8.6662806  -8.5592983
#>  [13]  -8.5367157  -8.4910222  -8.3890497  -7.8133268  -6.4866995  -6.0692712
#>  [19]  -5.9291644  -5.8278082  -5.6641459  -5.2082661  -4.4627820  -4.4330400
#>  [25]  -4.3465923  -3.8184115  -3.6274575  -3.4019631  -3.2415773  -3.2174858
#>  [31]  -3.2065880  -2.7235370  -2.6895752  -2.3613489  -1.8588168  -0.5777312
#>  [37]  -0.5511844  -0.4716543  -0.4050130  -0.3625580  -0.3223003  -0.2406614
#>  [43]  -0.1458646   0.6303930   0.9276725   0.9971236   1.1256560   1.4180867
#>  [49]   1.5742452   1.5878896   2.9884190   3.1439559   3.2570550   3.2811041
#>  [55]   3.5487001   3.5739439   3.7753310   4.1390687   4.2192818   4.2512337
#>  [61]   4.9418975   5.0570560   5.0759461   5.1206829   5.2070530   5.2506055
#>  [67]   5.2585237   7.1386640   7.2591422   7.7837700   8.0232051   8.0493729
#>  [73]   8.0954848   8.4468039   8.4962396   9.5233324   9.6648545   9.8722515
#>  [79]  10.1899818  10.5369719  10.7261871  10.9571529  11.4672746  11.8474686
#>  [85]  12.5205669  12.6533697  12.6865566  12.7313040  13.4437393  13.5380343
#>  [91]  13.6582333  13.8118005  13.9800163  14.4812357  14.5421887  14.5581481
#>  [97]  14.6805211  14.7979317  14.8541526  14.9660018

Created on 2022-06-07 by the reprex package (v2.0.1)

mikemahoney218 commented 2 years ago

Thank you! That makes a ton of sense. Is there documentation around using custom transformations anywhere?

Edit to add: or using transformations generally! I'd have expected transform to be applied first, though I can understand why in a dials context things flow the other direction.

hfrick commented 2 years ago

This section contains information about transformations: https://dials.tidymodels.org/articles/dials.html#numeric-parameters

juliasilge commented 2 years ago

I feel like I may be missing it 🙈 but I'm not seeing info on how to use a custom trans here. If we don't have that yet, can we add it? Or highlight it more directly?

mikemahoney218 commented 2 years ago

+1 to @juliasilge .

I think my confusion came from the parameter documentation, namely:

Arguments
range   
A two-element vector holding the defaults for the smallest and largest possible values, respectively.

trans   
A trans object from the scales package, such as scales::log10_trans() or scales::reciprocal_trans(). If not provided, the default is used which matches the units used in range. If no transformation, NULL.

It's not clear to me from this (or from the docs Hannah linked) that range should be the already-transformed values, and that trans will be applied inverse first. My expectation is that I was passing an untransformed range, and that my transform would be used transform-first.

Maybe something flagging, eg

Note that the values in `range` should be the _transformed_ values -- that is, they're the range of acceptable values after the transform specified by `trans` is applied. To see the range of _raw_ values, use `range_get()` like this:

cost() |> range_get()

Would make that a little more clear? (Possibly in a section on writing your own transformations?)

hfrick commented 2 years ago

I'm trying to figure out where to best put that extended information.

@mikemahoney218 Is the first quote block from the documentation of cost()? We probably should check/update the docs for each quantitative parameter since that's always how range works (see the docs for new_quant_param()).

How to create a tuning parameter function could also be expanded.

mikemahoney218 commented 2 years ago

I got it from ?dials::rbf_sigma, but I believe it's inherited in a lot of functions from:

https://github.com/tidymodels/dials/blob/e3e29a49450621d9b58750cc4b93b18b296f8f65/R/param_Laplace.R#L5

juliasilge commented 2 years ago

The change in the docs in #243 looks helpful but I still don't see any examples of how to do this correctly. I would probably still have a hard time getting started on the right path. What would be a good place in the dials docs to show how to do this with code?

hfrick commented 2 years ago

I was going put that in the tidymodels.org article but after reading your comment there, I'm gonna add a bit to the dials Get Started vignette 👍

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.