statdivlab / rigr

Regression, Inference, and General Data Analysis Tools for R
Other
10 stars 3 forks source link

predictors dropped from large models #152

Open adw96 opened 1 month ago

adw96 commented 1 month ago

Consider the following:

library(rigr)
library(tidyverse)
library(survival)
data(mri)

mri_complete <- mri[mri %>% complete.cases, ]
mri_tte <- Surv(mri_complete$obstime, mri_complete$death)
my_regress <- regress("hazard", mri_tte ~ height + weight + race + sex + age + dsst + atrophy +
                        plt + sbp + fev + dsst + atrophy + chf + chf + alcoh, 
                      data=mri_complete ) ## generates a warning
my_regress ### last three lines cut off
my_regress %>% coef ### still missing

The last predictor shown is atrophy, with the final three dropped. The warning generated is

Warning message:
In formula.character(object, env = baseenv()) :
  Using formula(x) is deprecated when x is a character vector of length > 1.
  Consider formula(paste(x, collapse = " ")) instead.

Dropping of predictors is not good! @svteichman Could you please investigate?

@yiqunchen we would be grateful for you to review a PR when it's ready.

adw96 commented 1 month ago

Just FYI for anyone following this -- this came to our attention because someone was fitting a large "hazard" model and particularly wanted lincom functionality. Shockingly, multcomp::glht won't take a vector of weights, but it can work as follows

### an alternative in the meantime for testing linear combinations
my_coxph <- coxph(mri_tte ~ height + weight + race + sex + age + dsst + atrophy +
                    plt + sbp + fev + dsst + atrophy + chf + chf + alcoh, 
                  data=mri_complete )
glht(my_coxph, linfct = c("7.5*alcoh -5.5*dsst = 0"))
yiqunchen commented 1 month ago

Thanks for the notice @adw96 ! It's been a while since my rigr days so I will have to warm up back into it. but regarding your vector of weights -- glht does take a contrast matrix though? Specifically, the following provides the identical results as the example you gave -- maybe I missed something here?

### an alternative in the meantime for testing linear combinations
my_coxph <- coxph(mri_tte ~ height + weight + race + sex + age + dsst + atrophy +
                    plt + sbp + fev + dsst + atrophy + chf + chf + alcoh, 
                  data=mri_complete )
# Create the contrast matrix
contrast_matrix <- matrix(0, nrow=1, ncol=length(coef(my_coxph)))
colnames(contrast_matrix) <- names(coef(my_coxph))

# Set the coefficients for alcoh and dsst
contrast_matrix[1, "alcoh"] <- 7.5
contrast_matrix[1, "dsst"] <- -5.5

# Perform the general linear hypothesis test
glht_results <- glht(my_coxph, linfct = contrast_matrix)

# Summary of the results
summary(glht_results)
adw96 commented 1 month ago

Doh! I only tried it as a vector, not as a matrix. Sigh. Thanks for this, @yiqunchen !

@yiqunchen -- my proposal is that super-RA @svteichman does the initial fix and PR, and then you would review her code. We will keep you updated -- and I hope this shouldn't take more than 15 mins of your time to review.

svteichman commented 1 month ago

This was a pretty straightforward issue to address and should be an easy PR to review! It turns out the formula object was being passed into the deparse function, which automatically adds line breaks when the input is longer than a certain length, so for large formulas it was splitting the result into multiple pieces, which was creating the warning we were seeing and dropping predictors.