Closed franzbischoff closed 2 years ago
All of our grid functions currently sample the same number of times for each parameter, so we don't have automatic support for these differences by parameter in those functions.
The tune functions like tune_grid()
can take any data frame, so you can create your own tuning grid however you prefer.
For example, you could do a regular grid like this:
library(tidyverse)
library(dials)
#> Loading required package: scales
#>
#> Attaching package: 'scales'
#> The following object is masked from 'package:purrr':
#>
#> discard
#> The following object is masked from 'package:readr':
#>
#> col_factor
crossing(
penalty = penalty(c(-1, 0)) %>% value_sample(100),
mixture = mixture() %>% value_sample(5)
)
#> # A tibble: 500 × 2
#> penalty mixture
#> <dbl> <dbl>
#> 1 0.100 0.0596
#> 2 0.100 0.262
#> 3 0.100 0.365
#> 4 0.100 0.517
#> 5 0.100 0.927
#> 6 0.110 0.0596
#> 7 0.110 0.262
#> 8 0.110 0.365
#> 9 0.110 0.517
#> 10 0.110 0.927
#> # … with 490 more rows
Created on 2022-06-15 by the reprex package (v2.0.1)
I think the word you are looking for is more similar to granularity/"sensitive to small changes" than weights/important.
What Julia has above is properly your best bet. You can also use the levels
argument if you want to use grid_regular()
instead of crossing()
.
library(dials)
grid_regular(
parameters(penalty(c(-1, 0)), mixture()),
levels = c(penalty = 100, mixture = 5)
)
#> # A tibble: 500 × 2
#> penalty mixture
#> <dbl> <dbl>
#> 1 0.1 0
#> 2 0.102 0
#> 3 0.105 0
#> 4 0.107 0
#> 5 0.110 0
#> 6 0.112 0
#> 7 0.115 0
#> 8 0.118 0
#> 9 0.120 0
#> 10 0.123 0
#> # … with 490 more rows
As far as I know you can't use a space filling sampling in a way that would explore more values for 1 parameter then another.
I'm trying to show this visually. First is a hypercube with 25 values. Secondly is one where I "zoomed" into the 25% lowest values for trees
and stretched it out to the original size. These two grids looks indistinguishable because they fill the space in the same way.
library(dials)
library(ggplot2)
library(dplyr)
set.seed(1234)
grid_latin_hypercube(parameters(tree_depth(), trees()), size = 25) |>
ggplot(aes(trees, tree_depth)) +
geom_point()
grid_latin_hypercube(parameters(tree_depth(), trees()), size = 100) |>
mutate(trees = trees * 4) |>
filter(trees < 2000) |>
ggplot(aes(trees, tree_depth)) +
geom_point()
Created on 2022-06-16 by the reprex package (v2.0.1)
Thank you for your answers! I was a little busy this week, I'll read carefully your explanations.
Hello @EmilHvitfeldt, I understood how we may solve this case for any custom scenario. But, aside of the main question. I'm struggling to see how both of your plots looks indistinguishable
? Did I miss some transformation?
Thank you!
Edit: I think you are saying "indistinguishable" in the way they seem to come from the same underlying generative model, right? I think we can close this subject 😃
Thank!
thank you for the discussion @franzbischoff !
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Feature
(note, maybe this is related to
dials
, nottune
).Supposing we have four parameters we want to tune and the model fitting process takes time.
Let's say that these parameters vary from 0 to 10, numeric, not integers.
Currently the default
grid_latin_hypercube()
will try to fill the tuning grid giving the same importance to each parameter. But, imagine that parameter 1 is more important and parameter 2 is less important. I would like to have more variability for 1 and less for 2. For example: [0, 0.1, ..., 9.9, 10] for parameter 1, and [0, 1, 2, ..., 10] for parameter 2.The idea is kind of
case_weights
, I think, but for parameters.Currently, I worked around this using a custom
scales::trans_new()
that rounds the parameters to a custom value, forcingtune
to jump by 0.2, or by 5 for example.Maybe is there already a solution for this?
Thank you!