insightsengineering / cardx

R Package to Supplement ARD Functions Found in {cards}
https://insightsengineering.github.io/cardx/
Other
10 stars 2 forks source link

[Feature Request]: Give nice error messages when users mistype a model's arguments #176

Closed thisisnic closed 2 months ago

thisisnic commented 2 months ago

Feature description

From a conversation I was having with @ddsjoberg:

When users mis-specify the model, I would prefer to see messaging something like, "There was an error constructing model glm(am ~ mpg, data = mtcars, fomily = binomial), then print the error message. i would hope this will help them see the typo more easily.

library(cardx)

# this is what happens when the arguments are all correct.
construct_model(mtcars, formula = am ~ mpg, method = "glm", method.args = list(family = binomial())) |>
  broom::tidy()
#> # A tibble: 2 × 5
#>   term        estimate std.error statistic p.value
#>   <chr>          <dbl>     <dbl>     <dbl>   <dbl>
#> 1 (Intercept)   -6.60      2.35      -2.81 0.00498
#> 2 mpg            0.307     0.115      2.67 0.00751
# we could improve error messaging when the user passes bad argument
# in this example, I've mis-spelled 'family'
construct_model(mtcars, formula = am ~ mpg, method = "glm", method.args = list(fomily = binomial()))
#> Error in glm.control(fomily = structure(list(family = "binomial", link = "logit", : unused argument (fomily = list("binomial", "logit", function (mu)
#> .Call(C_logit_link, mu), function (eta)
#> .Call(C_logit_linkinv, eta), function (mu)
#> mu * (1 - mu), function (y, mu, wt)
#> <truncated>

Code of Conduct

Contribution Guidelines

Security Policy

thisisnic commented 2 months ago

@ddsjoberg I was looking into this one, and some of it seems straightforward - I was looking at capturing the possible valid arguments to supply to the model fit function and comparing the supplied ones against that - but what about cases where the method allows users to pass in extra arguments to pass to other methods via ...? This might make it more tricky to guarantee something is a misspelling.

thisisnic commented 2 months ago
# yay, unambiguous
names(formals(survival::Surv))
#> [1] "time"   "time2"  "event"  "type"   "origin"
# boo, ambiguous
names(formals(getS3method("t.test", "default")))
#> [1] "x"           "y"           "alternative" "mu"          "paired"     
#> [6] "var.equal"   "conf.level"  "..."

Created on 2024-06-27 with reprex v2.1.0

thisisnic commented 2 months ago

Or maybe it's not so ambiguous, as if we trace the docs all the way down, there are only so many possible values which can be passed in via ellipses and a lot of the time they're ignored? I'll admit being less familiar with the internals of stats functions in R.

ddsjoberg commented 2 months ago

those are good point @thisisnic ! 🤔🤔🤔

Perhaps something a little simpler where we just capture the error and return it with a bit more context?

# this is what the code currently looks like.
withr::with_namespace(
  package = package,
  call2(.fn = method, formula = formula, data = data, !!!method.args) |>
    eval_tidy(env = env)
)

# maybe it could look something more like this?
call_to_run <- call2(.fn = method, formula = formula, data = data, !!!method.args)

withr::with_namespace(
  package = package,
  tryCatch(
    eval_tidy(call_to_run, env = env),
    error = function(e) {
      cli::cli_abort(
        c("There was an error evaluating model {.code {deparse1(call_to_run)}}. See error message below:",
          x = conditionMessage(e))
      )
    }
  )
)

BUT, there are some cases when deparse1(call_to_run) is going to be U-G-L-Y!

  1. By default the entire data frame is injected into the call. But no one wants to see that. We can probably just replace it with a .? For example, call_to_run$data <- expr(.); deparse1(call_to_run).
  2. The documentation requests that the method be passed as a string. But unquoted is also accepted. But when they pass unquoted methods, the entire function is injected into the call. If we were to deparse that call, it would be SUPER long. I was thinking we could first deparse the call, and if it's less than say 100 characters (just making this up...should that be shorter, longer?), print the nice message. Otherwise, just print c("There was an error evaluating the model. See error message below:", x = conditionMessage(e))