tidymodels / dials

Tools for creating tuning parameter values
https://dials.tidymodels.org/
Other
113 stars 27 forks source link

Parameter importance for tuning grid construction #248

Closed franzbischoff closed 2 years ago

franzbischoff commented 2 years ago

Feature

(note, maybe this is related to dials, not tune).

Supposing we have four parameters we want to tune and the model fitting process takes time.

Let's say that these parameters vary from 0 to 10, numeric, not integers.

Currently the default grid_latin_hypercube() will try to fill the tuning grid giving the same importance to each parameter. But, imagine that parameter 1 is more important and parameter 2 is less important. I would like to have more variability for 1 and less for 2. For example: [0, 0.1, ..., 9.9, 10] for parameter 1, and [0, 1, 2, ..., 10] for parameter 2.

The idea is kind of case_weights, I think, but for parameters.

Currently, I worked around this using a custom scales::trans_new() that rounds the parameters to a custom value, forcing tune to jump by 0.2, or by 5 for example.

Maybe is there already a solution for this?

Thank you!

juliasilge commented 2 years ago

All of our grid functions currently sample the same number of times for each parameter, so we don't have automatic support for these differences by parameter in those functions.

The tune functions like tune_grid() can take any data frame, so you can create your own tuning grid however you prefer.

For example, you could do a regular grid like this:

library(tidyverse)
library(dials)
#> Loading required package: scales
#> 
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#> 
#>     discard
#> The following object is masked from 'package:readr':
#> 
#>     col_factor

crossing(
  penalty = penalty(c(-1, 0)) %>% value_sample(100),
  mixture = mixture() %>% value_sample(5)
  )
#> # A tibble: 500 × 2
#>    penalty mixture
#>      <dbl>   <dbl>
#>  1   0.100  0.0596
#>  2   0.100  0.262 
#>  3   0.100  0.365 
#>  4   0.100  0.517 
#>  5   0.100  0.927 
#>  6   0.110  0.0596
#>  7   0.110  0.262 
#>  8   0.110  0.365 
#>  9   0.110  0.517 
#> 10   0.110  0.927 
#> # … with 490 more rows

Created on 2022-06-15 by the reprex package (v2.0.1)

EmilHvitfeldt commented 2 years ago

I think the word you are looking for is more similar to granularity/"sensitive to small changes" than weights/important.

What Julia has above is properly your best bet. You can also use the levels argument if you want to use grid_regular() instead of crossing().

library(dials)

grid_regular(
  parameters(penalty(c(-1, 0)), mixture()),
  levels = c(penalty = 100, mixture = 5)
)
#> # A tibble: 500 × 2
#>    penalty mixture
#>      <dbl>   <dbl>
#>  1   0.1         0
#>  2   0.102       0
#>  3   0.105       0
#>  4   0.107       0
#>  5   0.110       0
#>  6   0.112       0
#>  7   0.115       0
#>  8   0.118       0
#>  9   0.120       0
#> 10   0.123       0
#> # … with 490 more rows

As far as I know you can't use a space filling sampling in a way that would explore more values for 1 parameter then another.

I'm trying to show this visually. First is a hypercube with 25 values. Secondly is one where I "zoomed" into the 25% lowest values for trees and stretched it out to the original size. These two grids looks indistinguishable because they fill the space in the same way.

library(dials)
library(ggplot2)
library(dplyr)

set.seed(1234)

grid_latin_hypercube(parameters(tree_depth(), trees()), size = 25) |>
  ggplot(aes(trees, tree_depth)) +
  geom_point()

grid_latin_hypercube(parameters(tree_depth(), trees()), size = 100) |>
  mutate(trees = trees * 4) |>
  filter(trees < 2000) |>
  ggplot(aes(trees, tree_depth)) +
  geom_point()

Created on 2022-06-16 by the reprex package (v2.0.1)

franzbischoff commented 2 years ago

Thank you for your answers! I was a little busy this week, I'll read carefully your explanations.

franzbischoff commented 2 years ago

Hello @EmilHvitfeldt, I understood how we may solve this case for any custom scenario. But, aside of the main question. I'm struggling to see how both of your plots looks indistinguishable? Did I miss some transformation?

Thank you!

Edit: I think you are saying "indistinguishable" in the way they seem to come from the same underlying generative model, right? I think we can close this subject 😃

Thank!

hfrick commented 2 years ago

thank you for the discussion @franzbischoff !

github-actions[bot] commented 2 years ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.