Open topepo opened 4 months ago
I'm doing some work on this. Currently, I have an interface for including an expression attribute in the parameter set. I know that we try to avoid attributes but I do see a better way to add one.
I've also done a little work on using it with grids so you can see that it works in this example:
# pak::pak("tidymodels/dials@constraints"), ask = FALSE)
library(tidymodels)
no_constr_prm <- parameters(lambda = penalty(), mixture(), num_terms(c(1, 10)))
has_constraint(no_constr_prm)
#> [1] FALSE
constr_prm <- add_parameter_constraint(no_constr_prm, lambda < 0.01)
constr_prm
#> Collection of 3 parameters for tuning
#>
#> identifier type object
#> mixture mixture nparam[+]
#> lambda penalty nparam[+]
#> num_terms num_terms nparam[+]
#>
#> Parameter constraint: `lambda < 0.01`.
has_constraint(constr_prm)
#> [1] TRUE
remove_parameter_constraint(constr_prm)
#> Collection of 3 parameters for tuning
#>
#> identifier type object
#> mixture mixture nparam[+]
#> lambda penalty nparam[+]
#> num_terms num_terms nparam[+]
update_parameter_constraint(constr_prm, log10(lambda) < -2 & mixture > 1 / 2)
#> Collection of 3 parameters for tuning
#>
#> identifier type object
#> mixture mixture nparam[+]
#> lambda penalty nparam[+]
#> num_terms num_terms nparam[+]
#>
#> Parameter constraint: `log10(lambda) < -2 & mixture > 1 / 2`.
is_even_exp <- quote(num_terms %% 2 == 0)
even_prm <- add_parameter_constraint(no_constr_prm, !!is_even_exp)
even_prm
#> Collection of 3 parameters for tuning
#>
#> identifier type object
#> mixture mixture nparam[+]
#> lambda penalty nparam[+]
#> num_terms num_terms nparam[+]
#>
#> Parameter constraint: `num_terms %% 2 == 0`.
set.seed(1)
grid_random(even_prm, size = 100) %>% count(num_terms)
#> # A tibble: 5 × 2
#> num_terms n
#> <int> <int>
#> 1 2 8
#> 2 4 13
#> 3 6 7
#> 4 8 11
#> 5 10 11
Created on 2024-11-08 with reprex v2.1.1
From a design point of view, I think that the best thing to do would be to add an additional argument to new_param_grid()
for the constraint so that it would be universally used. Otherwise, we'd have to change the six make_*_grid()
functions.
My current would be to make a set of PRs:
new_param_grid()
filter
argument in grid_random()
and grid_regular()
as deprecated (it is not in the others). new_in_neighborhood()
code to produce new candidates without violating the constraints. The finetune changes might be a moderate refactor since we need all of the parameters to test the constraints. Right now, for each parameter, it generates a lot of candidates for each predictor, picks a single combination, and then updates the current grid. While it is not slow, it could probably use a design re-evaluation.
I know that we try to avoid attributes but I do see a better way to add one.
i'm assuming you mean "but i don't see a better way to add one". I feel attributes are fine in this case.
are we going to force users to create one long expression for the contrains, or allow them to pass in multiple constraints.
are there times where we (tidymodels) can apply constraints to a grid based on the information we have about a workflow? if yes i think it would be neat to do so.
i know we are getting into non-deterministic stuff, but the below output of grid_random()
produced 50 combinations instead of 100 as was asked.
on one hand it is straight forward to sample to get 100 that satisfy the condition. but there are many times where it isn't possible to get the required unique cases. We already see that right now.
I want to add some constraints to parameter sets to filter out some grid points based on user-defined constraints:
mtry
<=num_comp
widow_size
)and so on.
My thoughts are that we could have an optional list of expressions attached to a parameter set, and these filters could be executed when grids are created.