harrelfe / rms

Regression Modeling Strategies
https://hbiostat.org/R/rms
Other
172 stars 48 forks source link

Warning when using summary() for a logistic regression model for long model formulas #148

Open schw4b opened 3 months ago

schw4b commented 3 months ago

I get warnings when summarizing my model fits. The warning appears to be related to the number of variables in the model and the length of the variable names. I give a minimal example with 3 long variable names that will cause such a warning.

library(rms)
set.seed(2024)
n = 100
d = data.frame(y = rep(c(TRUE, FALSE), n/2),
               ITBILNWTGKRVEFOYESOBXTPAFRFBMT = rnorm(n),
               TFMBQUSRFRCXMIEEYRKVVQBKHOBSBN = rnorm(n),
               SIRTBLUYLQPQIGUJZHXWMOCJYAOKHS = rnorm(n)
)

dd = datadist(d)
options(datadist='dd')

fit.lrm = lrm(y ~ ITBILNWTGKRVEFOYESOBXTPAFRFBMT + TFMBQUSRFRCXMIEEYRKVVQBKHOBSBN + 
                SIRTBLUYLQPQIGUJZHXWMOCJYAOKHS
              , data = d)
summary(fit.lrm)

This will produce the following output warning:

> summary(fit.lrm)
             Effects              Response : y

 Factor                         Low      High    Diff.  Effect    S.E.
 ITBILNWTGKRVEFOYESOBXTPAFRFBMT -0.71996 0.67199 1.3919 -0.168600 0.27669
  Odds Ratio                    -0.71996 0.67199 1.3919  0.844850      NA
 TFMBQUSRFRCXMIEEYRKVVQBKHOBSBN -0.62695 0.86605 1.4930  0.160820 0.29608
  Odds Ratio                    -0.62695 0.86605 1.4930  1.174500      NA
 SIRTBLUYLQPQIGUJZHXWMOCJYAOKHS -0.52769 0.62263 1.1503 -0.037629 0.26052
  Odds Ratio                    -0.52769 0.62263 1.1503  0.963070      NA
 Lower 0.95 Upper 0.95
 -0.71090   0.37370
  0.49120   1.45310
 -0.41948   0.74113
  0.65739   2.09830
 -0.54824   0.47299
  0.57796   1.60480

Warning message:
In formula.character(object, env = baseenv()) :
  Using formula(x) is deprecated when x is a character vector of length > 1.
  Consider formula(paste(x, collapse = " ")) instead.

In contrast, glm() does not seem to be irritated:

fit.lm = glm(y ~ ITBILNWTGKRVEFOYESOBXTPAFRFBMT + TFMBQUSRFRCXMIEEYRKVVQBKHOBSBN + 
            SIRTBLUYLQPQIGUJZHXWMOCJYAOKHS, data = d, family = "binomial")
summary(fit.lm)
schw4b commented 3 months ago

The issue seems to be caused by lines 1399 to 1402 in rmsMisc.s

    form <- format(form)
    which[which == 'offset'] <- '.off.'
    form <- as.formula(gsub('offset(', '.off.(', form, fixed=TRUE))

The fix would be:

    # form <- format(form)
    form <- paste(trimws(format(form)), collapse = " ")
    which[which == 'offset'] <- '.off.'
    form <- as.formula(gsub('offset(', '.off.(', form, fixed=TRUE))