tidymodels / tune

Tools for tidy parameter tuning
https://tune.tidymodels.org
Other
275 stars 42 forks source link

tuning with list-columns in `grid` #625

Closed SHo-JANG closed 1 year ago

SHo-JANG commented 1 year ago

I want to customize objective function.

For example, what I want to implement now is a loss function called Focal loss, which can be used when there is a class imbalance. Focal Loss

Because focal loss is a generalization of cross-entropy, setting certain hyperparameters will result in exactly the same result as CE.

library(tidymodels)
library(xgboost)
#> 
#> Attaching package: 'xgboost'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice

data(agaricus.train, package = "xgboost")
data(agaricus.test, package = "xgboost")

dtrain <- with(agaricus.train, xgb.DMatrix(data, label = label, nthread = 2))
dtest <- with(agaricus.test, xgb.DMatrix(data, label = label, nthread = 2))
watchlist <- list(train = dtrain, eval = dtest)

# original cross entropy loss
logregobj <- function(preds, dtrain) {
  labels <- getinfo(dtrain, "label")
  preds <- 1/(1 + exp(-preds))
  grad <- preds - labels
  hess <- preds * (1 - preds)
  return(list(grad = grad, hess = hess))
}

# focal_loss --------------------------------------------------------------

focal_loss <- function(preds, dtrain,alpha=0.5,focal_gamma=0) {
  labels <- getinfo(dtrain, "label")
  preds <- 1 / (1 + exp(-preds))

  p<- preds
  y <- labels

  grad <- (y*alpha*(1-p)^focal_gamma*(focal_gamma*log(p)*p-(1-p))-
             (1-y)*(1-alpha)*p^(focal_gamma)*(focal_gamma*log(1-p)*(1-p)-p))

  du <- y*alpha*(1-p)^(focal_gamma-1)*(log(p)*(-focal_gamma^2 *p + (1-p)*focal_gamma)+ 2*focal_gamma*(1-p)+(1-p))
  dv <- -(1-y)*(1-alpha)*p^(focal_gamma-1)*(log(1-p)*(focal_gamma^2*(1-p)-p*focal_gamma)-2*focal_gamma*p-p)

  hess <- (du+dv)*p*(1-p)

  return(list(grad = grad, hess = hess))
}

param <- list(max_depth = 2, eta = 1, nthread = 2,
              objective = "binary:logistic", eval_metric = "auc")
bst <- xgb.train(param, dtrain, nrounds = 5, watchlist)
#> [1]  train-auc:0.958228  eval-auc:0.960373 
#> [2]  train-auc:0.981413  eval-auc:0.979930 
#> [3]  train-auc:0.997070  eval-auc:0.998518 
#> [4]  train-auc:0.998757  eval-auc:0.998943 
#> [5]  train-auc:0.999298  eval-auc:0.999830
#> [1]  train-auc:0.958228  eval-auc:0.960373 
#> [2]  train-auc:0.981413  eval-auc:0.979930

bst$params$objective
#> [1] "binary:logistic"
#> [1] "binary:logistic"

param$objective <- logregobj

bst <- xgb.train(param, dtrain, nrounds = 5, watchlist)
#> [1]  train-auc:0.958228  eval-auc:0.960373 
#> [2]  train-auc:0.981413  eval-auc:0.979930 
#> [3]  train-auc:0.997070  eval-auc:0.998518 
#> [4]  train-auc:0.998757  eval-auc:0.998943 
#> [5]  train-auc:0.998120  eval-auc:0.999830
#> [1]  train-auc:0.958228  eval-auc:0.960373 
#> [2]  train-auc:0.981413  eval-auc:0.979930

param$objective <- focal_loss# alpha = 0.5, gamma= 0 -> same result! 

bst <- xgb.train(param, dtrain, nrounds = 5, watchlist)
#> [1]  train-auc:0.958228  eval-auc:0.960373 
#> [2]  train-auc:0.981413  eval-auc:0.979930 
#> [3]  train-auc:0.997070  eval-auc:0.998518 
#> [4]  train-auc:0.998166  eval-auc:0.998943 
#> [5]  train-auc:0.998823  eval-auc:0.999830

I want to tuning alpha , focal_gamma

param$objective <- partial(focal_loss, alpha = 0.3, focal_gamma = 0)

bst <- xgb.train(param, dtrain, nrounds = 5, watchlist)
#> [1]  train-auc:0.979337  eval-auc:0.980196 
#> [2]  train-auc:0.992593  eval-auc:0.993159 
#> [3]  train-auc:0.999934  eval-auc:0.999917 
#> [4]  train-auc:0.999978  eval-auc:0.999972 
#> [5]  train-auc:0.999978  eval-auc:0.999972

Created on 2023-02-28 with reprex v2.0.2

I checked that it works well on the existing xgb.train, but I don't know how to apply the tune function. From now on, the code below is the way I tried.

library(tidymodels)
library(xgboost)
#> 
#> Attaching package: 'xgboost'
#> The following object is masked from 'package:dplyr':
#> 
#>     slice
library(scales)
library(dials)

alpha <- function(range = c(0,1), trans = NULL) {
  new_quant_param(
    type = "double",
    range = range,
    inclusive = c(TRUE, TRUE),
    trans = trans,
    label = c(num_initial_terms = "# Initial alpha"),
    finalize = NULL
  )
}

focal_gamma <- function(range = c(0, 5), trans = NULL) {
  new_quant_param(
    type = "double",
    range = range,
    inclusive = c(TRUE, TRUE),
    trans = trans,
    label = c(num_initial_terms = "# Initial gamma"),
    finalize = NULL
  )
}

data<- two_class_dat |> 
  rename(y=Class)

set.seed(100)
splits<- initial_split(data,prop = 0.8,strata = y)
train_data <- training(splits)
test_data <- testing(splits)
resamples<- vfold_cv(data = train_data,v = 5,strata = y)

xgb_model <- boost_tree( mode = "classification",
                               tree_depth     =tune(),
                               trees          =tune()) |> 
  set_engine(engine = "xgboost" ,
             objective = partial(focal_loss,
                                 focal_gamma=tune())) # 
#I wanted to tune the two hyperparameters together at first, but due to the error message, I am aiming to tune one first.

#>Error: Only one tunable value is currently allowed per argument.
#>The current argument has: `partial(focal_loss, focal_gamma = tune(), alpha = tune())`.

xgb_model |> translate()
#> Boosted Tree Model Specification (classification)
#> 
#> Main Arguments:
#>   trees = tune()
#>   tree_depth = tune()
#> 
#> Engine-Specific Arguments:
#>   objective = partial(focal_loss, focal_gamma = tune())
#> 
#> Computational engine: xgboost 
#> 
#> Model fit template:
#> parsnip::xgb_train(x = missing_arg(), y = missing_arg(), weights = missing_arg(), 
#>     nrounds = tune(), max_depth = tune(), objective = partial(focal_loss, 
#>         focal_gamma = tune()), nthread = 1, verbose = 0)

rec_base<- train_data %>% 
  recipe(y~.) |> 
  #step_mutate_at(all_numeric_predictors(), fn = list(orig = ~.)) %>%
  step_normalize(all_predictors(), -all_outcomes()) 

xgb_workflow <- 
  workflow() %>% 
  add_recipe(rec_base) %>% 
  add_model(xgb_model)

xgb_workflow %>%
  extract_parameter_set_dials()
#> Collection of 3 parameters for tuning
#> 
#>  identifier       type    object
#>       trees      trees nparam[+]
#>  tree_depth tree_depth nparam[+]
#>   objective  objective   missing
#> The parameter `objective` needs a `param` object. 
#> See `vignette('dials')` to learn more.

Created on 2023-02-28 with reprex v2.0.2

The extract_parameter_set_dials() function does not recognize the focal_gamma argument. How I can tuning objective function? Ultimately, I want to tune not only one parameter but also several parameters together.

simonpcouch commented 1 year ago

Hey @SHo-JANG, thanks for the issue.

This is a really interesting use case, and I appreciate the helpful reprex! You’ve worked to integrate with the machinery in a very intuitive way.

I think the hitch you’re running into here is that tune expects the thing to be tuned to be an argument to the training function. This is where the Only one tunable value is currently allowed per argument error is coming from. This is why, for instance, we have a light wrapper xgboost::xgb.train() that “lifts” the params list arguments to be main arguments.

As such, you’ll need to specify the partials of the objective function as your grid argument. This will be your entry point for tuning many arguments as well, as you can manually specify any combination of hyperparameters that you’d like.

I realize this is sub-optimal, and means that you have to be careful in keeping track of which partial is which, though I don’t anticipate we’ll develop this interface further as this is an uncommon use case and would require some fundamental changes to tune. You can use the extracts, though, to be sure you’ve kept track of hyperparameters rigorously.

Mirroring your reprex:

library(tidymodels)

# focal_loss --------------------------------------------------------------
focal_loss <- function(preds, dtrain, alpha = 0.5,focal_gamma = 0) {
  labels <- getinfo(dtrain, "label")
  preds <- 1 / (1 + exp(-preds))

  p<- preds
  y <- labels

  grad <- (y*alpha*(1-p)^focal_gamma*(focal_gamma*log(p)*p-(1-p))-
             (1-y)*(1-alpha)*p^(focal_gamma)*(focal_gamma*log(1-p)*(1-p)-p))

  du <- y*alpha*(1-p)^(focal_gamma-1)*(log(p)*(-focal_gamma^2 *p + (1-p)*focal_gamma)+ 2*focal_gamma*(1-p)+(1-p))
  dv <- -(1-y)*(1-alpha)*p^(focal_gamma-1)*(log(1-p)*(focal_gamma^2*(1-p)-p*focal_gamma)-2*focal_gamma*p-p)

  hess <- (du+dv)*p*(1-p)

  return(list(grad = grad, hess = hess))
}

# data setup ----------------------------------------------------------------
data <- two_class_dat %>% 
  rename(y = Class)

set.seed(100)

splits <- initial_split(data,prop = 0.8, strata = y)
train_data <- training(splits)
test_data <- testing(splits)
resamples <- vfold_cv(data = train_data,v = 5,strata = y)

# specifications ----------------------------------------------------
xgb_spec <- 
  boost_tree(mode = "classification", tree_depth = tune(), trees = tune()) %>% 
  # note that i just set `objective = tune()` here
  set_engine(object = ., engine = "xgboost", objective = tune())

xgb_recipe <- train_data %>% 
  recipe(y ~ .) |> 
  #step_mutate_at(all_numeric_predictors(), fn = list(orig = ~.)) %>%
  step_normalize(all_predictors(), -all_outcomes()) 

xgb_workflow <- 
  workflow() %>% 
  add_recipe(xgb_recipe) %>% 
  add_model(xgb_spec)

# grid setup ------------------------------------------------------------------
partial_grid <-
  expand_grid(
    alpha = seq(0, 1, length.out = 3),
    focal_gamma = seq(0, 5, length.out = 3)
  )

partial_grid$partials <-
  map2(
    partial_grid$alpha,
    partial_grid$focal_gamma,
    ~partial(focal_loss, alpha = !!.x, focal_gamma = !!.y)
  )

xgb_grid <-
  extract_parameter_set_dials(xgb_spec) %>%
  filter(id != "objective") %>%
  grid_latin_hypercube(size = 9) %>%
  bind_cols(partial_grid %>% select(objective = partials))

# tune! -----------------------------------------------------------------------
res <- tune_grid(xgb_workflow, resamples = resamples, grid = xgb_grid)
#> → A | error:   Error in xgb.iter.update(bst$handle, dtrain, ...
#> There were issues with some computations   A: x5
#> 
#> Warning: All models failed. Run `show_notes(.Last.tune.result)` for more
#> information.

Created on 2023-03-09 with reprex v2.0.2

That said, as you can see, there's an issue here: supplying partials of the function as part of a tibble column means that we need to wrap that collection of functions as a list. So, tune passes each possible value of objective as a one-element list containing the function, rather than the function itself:

Browse[2]> finalize_workflow_spec(workflow, iter_grid_model)
══ Workflow ══════════════════════════════════════════════════════════════════
Preprocessor: Recipe
Model: boost_tree()

── Preprocessor ──────────────────────────────────────────────────────────────
1 Recipe Step

• step_normalize()

── Model ─────────────────────────────────────────────────────────────────────
Boosted Tree Model Specification (classification)

Main Arguments:
  trees = 1886
  tree_depth = 2

Engine-Specific Arguments:
  objective = list(structure(function (...) {focal_loss <- function (preds,<snip>

Computational engine: xgboost 

and xgboost thus trips up by saying that it doesn't know what to do with a list for an objective function.

This is where that change from objective = tune() to objective = list(...) happens:

https://github.com/tidymodels/tune/blob/e61a83bf4180d38cd85234e5ab188a4579ca2fbd/R/grid_code_paths.R#L379

but that change might actually need to happen in workflows.

So, for now, this does not work, but we're on it. :)

simonpcouch commented 1 year ago

A possible fix in #633! See updated reprex there—you can install those changes with pak::pak("tidymodels/tune@625").

SHo-JANG commented 1 year ago
pak::pak("tidymodels/tune@625")
#> Error: ! error in pak subprocess
#> Caused by error: 
#> ! Could not solve package dependencies:
#> * tidymodels/tune@625: ! pkgdepends resolution error for tidymodels/tune@625.
#> Caused by error: 
#> ! Can't find reference @625 in GitHub repo tidymodels/tune.

Created on 2023-03-10 with reprex v2.0.2

.Last.error <callr_error/rlib_error_3_0/rlib_error/error> Error: ! error in pak subprocess Caused by error: ! Could not solve package dependencies:

It doesn't work for me! Thank you for your reply.

simonpcouch commented 1 year ago

Oh, woops! That ref is pak::pak("tidymodels/tune@633")😆

SHo-JANG commented 1 year ago

It works with pak::pak("tidymodels/tune#633") ! 🤣 Thank you very very much. But ultimately, I want to be able to do Bayesian optimization.

I sincerely thank you for creating such a useful ecosystem. If there are more things that can be customized, it will be a better ecosystem. There are some features that I didn't mention here, but thought it would be nice to have. I will make a suggestion after organizing it to persuade that it is a necessary function! I'm really sorry to quote another package here, but I hope it reaches the flexibility that the mlr3 ecosystem or python has.

If there is a necessary function, I will continue to suggest it. I still lack a lot of study in the R language, but I hope to be able to contribute to the tidymodels ecosystem one day. I'm sorry I always ask for it. I will make good use of this function you have improved. And I look forward to the update so that it can be applied to other models as well! Thank you very much.

(I don't speak English very well, so I communicate through a translator. I'm really sorry if there was any violation of etiquette.)

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.