Closed EmilHvitfeldt closed 2 months ago
right now the only blocker we have found for this, is dependent on how we handle https://github.com/tidymodels/parsnip/issues/1099
Very much agreed on the linked issue that we can do a better job checking arguments in check_args()
.
I think it's important to note here that the pattern in check_args()
methods is to eval_tidy()
each argument:
If we're able to eval_tidy()
each argument to model specification functions as soon as they're inputted, then there's no reason we should be capturing them as quosures in the first place. The reason we capture them as quosures is so that we can evaluate them in an environment that we construct, the eval_env
, at fit()
time:
This environment supports defining arguments in relation to the data to be fit()
ted to, like:
library(parsnip)
boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% fit(mpg ~ ., mtcars)
#> parsnip model object
#>
#> ##### xgb.Booster
#> raw: 21.6 Kb
#> call:
#> xgboost::xgb.train(params = list(eta = 0.3, max_depth = 6, gamma = 0,
#> colsample_bytree = 1, colsample_bynode = 0.5, min_child_weight = 1,
#> subsample = 1), data = x$data, nrounds = 15, watchlist = x$watchlist,
#> verbose = 0, nthread = 1, objective = "reg:squarederror")
#> params (as set within xgb.train):
#> eta = "0.3", max_depth = "6", gamma = "0", colsample_bytree = "1", colsample_bynode = "0.5", min_child_weight = "1", subsample = "1", nthread = "1", objective = "reg:squarederror", validate_parameters = "TRUE"
#> xgb.attributes:
#> niter
#> callbacks:
#> cb.evaluation.log()
#> # of features: 10
#> niter: 15
#> nfeatures : 10
#> evaluation_log:
#> iter training_rmse
#> <num> <num>
#> 1 14.9313149
#> 2 10.9841618
#> ---
#> 14 0.6109443
#> 15 0.5196238
...where .x()
is a data descriptor. As of now, check_args()
knows that it's unable to evaluate such an expression:
boost_tree(mtry = ncol(.x()) / 2, mode = "regression") %>% check_args()
#> Error in `descr_env$.x()`:
#> ! Descriptor context not set
Created on 2024-04-03 with reprex v2.1.0
Calling check_args()
will not be able to give the same output when called inside of new_model_spec()
because it doesn't yet have access to the data--the fact that the behavior might differ between these two contexts, to me, signals that we ought not to check_args()
until we have access to the data at fit()
.
There are definitely improvements to be made to argument checking, as in the loophole that you pointed out in the linked issue. I think that's a good place to direct focus for now. :)
Given the implications for data descriptors, I'm going to go ahead and close. Still very much on board for #1095. :)
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Adding a call to
check_args()
on this line https://github.com/tidymodels/parsnip/blob/main/R/misc.R#L350 produces the following results. The only questionable use-case is the last one, which people might use.Still looking at other places where this would fail