`predict()` on a glmnet workflow with a model formula fails

hfrick commented 1 year ago

This might be a workflows issue but I suspect we need to tweak censored to make the glment engine play nicely with workflows.

library(censored)
#> Loading required package: parsnip
#> Loading required package: survival
library(workflows)

lung <- lung %>%
  tidyr::drop_na() %>%
  dplyr::mutate(surv = Surv(time, status), .keep = "unused")

mod <- proportional_hazards(engine = "glmnet", penalty = 0.1)

# this works
wflow_plain <- workflow() %>% 
  add_formula(surv ~ .) %>% 
  add_model(mod)

wf_fit <- fit(wflow_plain, lung)

preds <- predict(wf_fit, new_data = lung)
preds <- predict(wf_fit, new_data = lung, type = "survival", eval_time = c(100, 200))

# with a model formula it does not
wflow_model_formula <- workflow() %>% 
  add_formula(surv ~ .) %>% 
  add_model(mod, formula = surv ~ .)

wf_fit <- fit(wflow_model_formula, lung)

preds <- predict(wf_fit, new_data = lung)
#> Error in eval(predvars, data, env): object '(Intercept)' not found
preds <- predict(wf_fit, new_data = lung, type = "survival", eval_time = c(100, 200))
#> Error in eval(predvars, data, env): object '(Intercept)' not found

^{Created on 2023-11-16 with reprex v2.0.2}

hfrick commented 1 year ago

I tried to understand what should happen when parsnip does the translation from formula to matrix interface so I looked at a glmnet model without censored being involved.

While everything runs, it does add an intercept to the terms which I think isn't right. It does not happen when fitting the same model through parsnip without workflows (or even just pure glmnet). It doesn't error at prediction time (like in the censored reprex) because .convert_form_to_xy() removes only the intercept column it added itself, not the one added by worflows via mold().

So I'm going to park this while investigating further in workflows.

Note for coming back: coxnet_prepare_x() may well still need to mimick parsnip:::prepare_data() more closely.

library(parsnip)
library(workflows)

wflow_fit <- 
  workflow() %>% 
  add_formula(mpg ~ .) %>% 
  add_model(linear_reg(penalty = 0.1, engine = "glmnet"), formula = mpg ~ .) %>% 
  fit(data = mtcars)

# this has a coefficient for an intercept column
rownames(wflow_fit$fit$fit$fit$beta)
#>  [1] "`(Intercept)`" "cyl"           "disp"          "hp"           
#>  [5] "drat"          "wt"            "qsec"          "vs"           
#>  [9] "am"            "gear"          "carb"

# when we fit the same model through parsnip alone, it does not
parsnip_fit <- linear_reg(penalty = 0.1, engine = "glmnet") %>% fit(mpg ~ ., data = mtcars)
rownames(parsnip_fit$fit$beta)
#>  [1] "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

# nor does the pure glmnet fit
glmnet_fit <- glmnet::glmnet(x = as.matrix(mtcars[, -1]), y = mtcars$mpg)
rownames(glmnet_fit$beta)
#>  [1] "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear" "carb"

^{Created on 2023-11-24 with reprex v2.0.2}

hfrick commented 11 months ago

This gets solved by https://github.com/tidymodels/parsnip/pull/1033, I'll bump the version requirement for parsnip after that one is merged.

hfrick commented 11 months ago

Follow-up re: Do we still need changes to coxnet_prepare_x()? I don't think we need changes right now. parsnip::prepare_data() does 3 things after to potential call to a conversion function:

remove an intercept (coming from workflows) based on the encodings
check if it's an (allowed) sparse matrix. if so, return early.
convert to the engine's data format (mainly data frame or matrix)

Because new_data passes through parsnip::prepare_data() before it gets to censored:::coxnet_prepare_x(), any intercept coming from workflows gets removed in parsnip. Thus no change needed here in censored.

Sparse matrices are their own issue: #276

coxnet_prepare_x() already converts to a matrix for glmnet, no change needed.

An intercept coming from workflows gets removed in parsnip's prepare_data() before we arrive at coxnet_prepare_x()

hfrick commented 11 months ago

https://github.com/tidymodels/parsnip/pull/1033 is merged - I'm going to close this. I decided to not bum the version requirement because this is "simply" a bug in parsnip and doesn't have anything (directly) to do with censored so it would be "off-pattern" to bump censored's version requirement.

github-actions[bot] commented 11 months ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

tidymodels / censored

`predict()` on a glmnet workflow with a model formula fails #272