tidymodels / hardhat

Construct Modeling Packages
https://hardhat.tidymodels.org
Other
100 stars 14 forks source link

Using a division on the left-hand side of a formula throws an "Interaction terms can't be specified on the LHS of `formula`" #241

Open erikvona opened 1 year ago

erikvona commented 1 year ago

As said in the title, using a division on the left-hand side of the formula throws an error.

This is a new bug, causing previously working scripts to break, thus a major pain for me.

It's really common to use divisions on the left-hand side, particularly in survival analysis, where survival time is often expressed in days but for plotting and tabling you often want time in months or years. I encountered this issue with the tidycmprsk package.

Using I(time / 365) to indicate no, this is not an interaction, just a regular division does not help.

library(survival)
library(hardhat)

hardhat::mold(Surv(time / 365, status) ~ ph.ecog, lung, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#> Error in `mold_formula_default_process_outcomes()`:
#> ! Interaction terms can't be specified on the LHS of `formula`.
#> ℹ The following interaction term was found: `time/365`.
#> Backtrace:
#>      ▆
#>   1. ├─hardhat::mold(Surv(time/365, status) ~ ph.ecog, lung, blueprint = hardhat::default_formula_blueprint(indicators = "none"))
#>   2. └─hardhat:::mold.formula(...)
#>   3.   ├─hardhat::run_mold(blueprint, data = data)
#>   4.   └─hardhat:::run_mold.default_formula_blueprint(blueprint, data = data)
#>   5.     └─hardhat:::mold_formula_default_process(...)
#>   6.       └─hardhat:::mold_formula_default_process_outcomes(blueprint = blueprint, data = data)
#>   7.         └─hardhat:::check_no_interactions(formula)
#>   8.           └─hardhat:::expr_check_no_interactions(expr, error_call = error_call)
#>   9.             └─hardhat:::expr_check_no_interactions(expr[[i]], error_call = error_call)
#>  10.               └─hardhat:::expr_check_no_interactions(expr[[i]], error_call = error_call)
#>  11.                 └─cli::cli_abort(message, call = error_call)
#>  12.                   └─rlang::abort(...)

Created on 2023-05-26 with reprex v2.0.2

erikvona commented 1 year ago

This got introduced at commit ef29fa1. You can revert to a previous version to get old scripts to run again using devtools::install_github("tidymodels/hardhat@887091e56fe0c39f9feeac36656c817239de528c").

Ironic that it's caused by fixing another of my issues 😅

DavisVaughan commented 6 months ago

I'm not really sure how this ever worked. We create two new formulas from LHS ~ RHS, one is ~ LHS and one is ~ RHS, and we evaluate them separately. Turns out x / 365 ~ 1 works pretty differently from ~ x / 365 when you use model.frame(). If x is numeric and run through model.frame(), x / 365 ~ will compute that expression right away and return a double vector. With ~ x / 365, it seems like it just errors.

I'm not 100% sure, but it seems like we might be doing too much with the stuff on the LHS. It is possible we can just evaluate it within the context of data and call it a day. But this will require a careful analysis of what model.frame() / model.matrix() does, and I currently don't have the bandwidth for that, so unfortunately it won't be solved in this upcoming patch release.

df <- data.frame(x = 1:5, y = as.factor(letters[1:5]))

model.frame(x / 5 ~ 1, df)
#>   x/5
#> 1 0.2
#> 2 0.4
#> 3 0.6
#> 4 0.8
#> 5 1.0

model.frame(~ x / 5, df)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars

model.frame(y / 5 ~ 1, df)
#> Warning in Ops.factor(y, 5): '/' not meaningful for factors
#> [1] y/5
#> <0 rows> (or 0-length row.names)

model.frame(~ y / 5, df)
#> Error in terms.formula(formula, data = data): invalid model formula in ExtractVars

Created on 2024-02-01 with reprex v2.0.2