tidymodels / parsnip

A tidy unified interface to models
https://parsnip.tidymodels.org
Other
590 stars 88 forks source link

issues with dev xgboost #1087

Open nipnipj opened 6 months ago

nipnipj commented 6 months ago

I'm trying to fit a model as follows:

xgboost_model <-
  boost_tree( ) %>%
  set_engine("xgboost") %>% 
  set_mode("regression")

wf_xgboost <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(xgboost_model)

trained_wf <- wf_xgboost %>% fit(train_data)

I get the following error:

Error in xgboost::xgb.DMatrix(data = x[trn_index, , drop = FALSE], missing = NA,  : 
  unused argument (info = info_list)
EmilHvitfeldt commented 6 months ago

Hello @nipnipj 👋

Can you please provide a minimal reprex (reproducible example)? The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it: please help me help you!

If you've never heard of a reprex before, start by reading "What is a reprex", and follow the advice further down that page.

I'm not able to reproduce the error

library(tidymodels)

train_data <- mtcars

rec <- recipe(mpg ~ ., data = train_data)

xgboost_model <-
  boost_tree() %>%
  set_engine("xgboost") %>% 
  set_mode("regression")

wf_xgboost <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(xgboost_model)

trained_wf <- wf_xgboost %>% fit(train_data)

trained_wf
#> ══ Workflow [trained] ══════════════════════════════════════════════════════════
#> Preprocessor: Recipe
#> Model: boost_tree()
#> 
#> ── Preprocessor ────────────────────────────────────────────────────────────────
#> 0 Recipe Steps
#> 
#> ── Model ───────────────────────────────────────────────────────────────────────
#> ##### xgb.Booster
#> raw: 21.6 Kb 
#> call:
#>   xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0, 
#>     colsample_bytree = 1, colsample_bynode = 1, min_child_weight = 1, 
#>     subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist, 
#>     verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#>   eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "1", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#>   niter
#> callbacks:
#>   cb.evaluation.log()
#> # of features: 10 
#> niter: 15
#> nfeatures : 10 
#> evaluation_log:
#>      iter training_rmse
#>     <num>         <num>
#>         1    14.9313149
#>         2    10.9568064
#> ---                    
#>        14     0.5628964
#>        15     0.4603055

Created on 2024-03-21 with reprex v2.1.0

nipnipj commented 6 months ago

The error persists for me.

library(tidymodels)

train_data <- mtcars

rec <- recipe(mpg ~ ., data = train_data)

xgboost_model <-
  boost_tree() %>%
  set_engine("xgboost") %>% 
  set_mode("regression")

wf_xgboost <- workflow() %>% 
  add_recipe(rec) %>% 
  add_model(xgboost_model)

trained_wf <- wf_xgboost %>% fit(train_data)
#> Error in xgboost::xgb.DMatrix(x, missing = NA, info = info_list): unused argument (info = info_list)

trained_wf
#> Error in eval(expr, envir, enclos): object 'trained_wf' not found

Created on 2024-03-22 with reprex v2.1.0

The reason might be because I installed XGBOOST from source.

simonpcouch commented 6 months ago

Ah, looks like we'd expect to see a few breakages with the new xgboost: https://github.com/dmlc/xgboost/issues/9810. watchlist deprecated as an xgb.train() argument and info as a xgboost::xgb.DMatrix() argument.

We will address those breakages once xgboost moves to submit to CRAN and lets us know what breaks. As implemented, supporting both the dev and current CRAN versions would be quite gnarly.