Closed misken closed 11 months ago
In addition, even using parsnip::fit()
(i.e. no resampling) within a workflow containing add_formula
also leads to the "invalid power in formula" error. But as shown in the reprex above, parsnip::fit(formula = some_nonlinear_formula, data = some_data)
will work just fine.
Thanks for the issue, @misken, and apologies for the delay!
The trick here is to use the formula
argument of add_model()
in workflows. From that package's docs:
formula: An optional formula override to specify the terms of the model. Typically, the terms are extracted from the formula or recipe preprocessing methods. However, some models (like survival and bayesian models) use the formula not to preprocess, but to specify the structure of the model. In those cases, a formula specifying the model structure must be passed unchanged into the model call itself. This argument is used for those purposes.
So, pass the formula that specifies the variables of interest (e.g. y ~ x
) as the preprocessor formula, and the other as the "model formula." In your example:
# libraries
library(tidymodels)
# Create synthetic data
x <- seq(1, 12)
epsilon <- rnorm(12)
a <- 5
b <- 1.5
y <- a * x ^ b + epsilon
df <- data.frame(x, y)
#
# Set up resampling
#
set.seed(57)
kfold_number <- 5
kfold_repeats <- 1
splits <- vfold_cv(df, v = kfold_number, repeats = kfold_repeats)
#
# Register new parsnip model based on nls()
#
set_new_model("nonlinear_reg")
set_model_mode(model = "nonlinear_reg", mode = "regression")
set_model_engine(
"nonlinear_reg",
mode = "regression",
eng = "nls"
)
set_dependency("nonlinear_reg", eng = "nls", pkg = "stats")
#
# Create the model function and add fit method and encoding details
#
nonlinear_reg <-
function(mode = "regression", start = NULL) {
# Check for correct mode
if (mode != "regression") {
rlang::abort("`mode` should be 'regression'")
}
# Capture the arguments in quosures
args <- list(start = rlang::enquo(start))
# Save some empty slots for future parts of the specification
new_model_spec(
"nonlinear_reg",
args = args,
mode = mode,
engine = NULL,
eng_args = NULL,
method = NULL
)
}
set_fit(
model = "nonlinear_reg",
eng = "nls",
mode = "regression",
value = list(
interface = "formula",
protect = c("formula", "data"),
func = c(pkg = "stats", fun = "nls"),
defaults = list()
)
)
set_encoding(
model = "nonlinear_reg",
eng = "nls",
mode = "regression",
options = list(
predictor_indicators = "none",
compute_intercept = FALSE,
remove_intercept = FALSE,
allow_sparse_x = FALSE
)
)
#
# Add modules for prediction
#
response_info <-
list(
pre = NULL,
post = NULL,
func = c(fun = "predict"),
args =
# These lists should be of the form:
# {predict.nls argument name} = {values provided from parsnip objects}
list(
# We don't want the first two arguments evaluated right now
# since they don't exist yet.
object = quote(object$fit),
newdata = quote(new_data)
)
)
set_pred(
model = "nonlinear_reg",
eng = "nls",
mode = "regression",
type = "numeric",
value = response_info
)
#
# Show that the new nls based parsnip model works if we call fit with
# entire data set or with some subset generated by vfold_cv()
# Need starting values for nls()
init_nls <- c(b1=1, b2=2)
nls_mod <- nonlinear_reg() %>% set_engine(engine = "nls", start = init_nls)
# Call fit with a specific model formula - a power function
nls_mod_formula <- y ~ b1 * (x ^ b2)
nls_wf <-
workflow() %>%
add_formula(y ~ x) %>%
add_model(nls_mod, formula = nls_mod_formula)
# Pipe our model through fit_resamples to see the error
fit_resamples(nls_wf, resamples = splits)
#> # Resampling results
#> # 5-fold cross-validation
#> # A tibble: 5 × 4
#> splits id .metrics .notes
#> <list> <chr> <list> <list>
#> 1 <split [9/3]> Fold1 <tibble [2 × 4]> <tibble [0 × 3]>
#> 2 <split [9/3]> Fold2 <tibble [2 × 4]> <tibble [0 × 3]>
#> 3 <split [10/2]> Fold3 <tibble [2 × 4]> <tibble [0 × 3]>
#> 4 <split [10/2]> Fold4 <tibble [2 × 4]> <tibble [0 × 3]>
#> 5 <split [10/2]> Fold5 <tibble [2 × 4]> <tibble [0 × 3]>
Created on 2023-10-31 with reprex v2.0.2
This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.
Reconcile handling of nonlinear formulas by
parsnip::fit()
andtune::fit_resamples()
Not sure if this is a feature or bug or neither. Created a custom parsnip model for nonlinear regression using
nls
as the engine (followed https://www.tidymodels.org/learn/develop/models/). Fit a simple power function $b1 x^{b2}$ withparsnip::fit()
. Works. Wanted to use same model with k-crossfold validation. However,tune::fit_resamples()
gives error saying formula contains invalid power (I imagine because it's wanting a constant power not a parameter to be estimated). Tried working around it withrecipes
but it also doesn't like the nonlinear formula and gives error saying that inline functions are not allowed (I'm assuming it's the ^ operator it is objecting to). If there's no way to bypass the formula checking intune::fit_resamples()
, seems like the workaround is to manually do the looping through the crossfold split object usingparsnip::fit()
to do model fitting and then add code for assessment of model fit. A reprex is below.Created on 2022-08-15 by the reprex package (v2.0.1)