statdivlab / rigr

Regression, Inference, and General Data Analysis Tools for R
Other
10 stars 3 forks source link

bugs in regress regarding survival analysis #148

Closed susannemay closed 4 months ago

susannemay commented 1 year ago

Dear all - When working with the survival analysis functions of rigr, my TAs discovered some issues. Here is a description: "There seems to be a bug in the rigr​ regress​ function for Cox PH regression ("hazard" functional) regarding missing values. Basically, if there is missingness in the covariate it doesn't run and throws a not-so-helpful error message about the vector of weights. You can pretty easily get around this by manually creating a data frame that excludes observations that have missing values for your covariate of interest." In addition, the relevel() function does not seem to work.

Also, below is some code they generated to illustrate the issue regarding missing values and our current workaround.

Susanne (Susanne May)

Cox PH ("hazards") rigr bug involving missing data

rigr version 1.0.5

April 10, 2023

load rigr package

library(rigr)

load survival package

library(survival)

load mri dataset

mri <- read.table("http://www.emersonstatistics.com/datasets/mri.txt", header = T)

create some new variables (following Discussion section slides)

mri$obstime_yrs <- mri$obstime/365.25 mri$ldlcat <- cut(mri$ldl, breaks=c(0, 70, 100, 130, 160, 190, 250), right=FALSE) mri$surv <- Surv(mri$obstime_yrs, mri$death)

won't run

m2 <- regress("hazard", surv~factor(ldlcat), data = mri)

will run

mri_rmna <- mri[!is.na(mri$ldlcat), ] m2 <- regress("hazard", surv~factor(ldlcat), data = mri_rmna)

adw96 commented 1 year ago

Thanks so much for letting us know about this, @susannemay.

I think this might be a hard issue to fix as I'm not very familiar with the internals of rigr's "hazard". Are you ok to work with with your temporary fix of subsetting the data manually for now? via

mri_rmna <- mri[!is.na(mri$ldlcat), ]
m2 <- regress("hazard", surv~factor(ldlcat), data = mri_rmna)

I think it will be a few weeks at least until I time to fix this. Are any of your TAs motivated to debug it and submit a pull request?

If you let me know how urgent this is I will see what I can do.

Notes for myself for later

While rigr generally handles NA's in predictor just fine, it returns an opaque error when fnctl = "hazard". This is not an issue shared by other functionals. For example, the following work fine. Note that mri$ldl has 10 NA's.

library(rigr)
library(tidyverse)
library(survival)
data(mri)
mri$obstime_yrs <- mri$obstime/365.25
mri$ldlcat <- cut(mri$ldl, breaks=c(0, 70, 100, 130, 160, 190, 250), right=FALSE)
mri$surv <- Surv(mri$obstime_yrs, mri$death)

### no problems with any of the below
regress("mean", obstime_yrs~ldl, data = mri_rmna)
regress("mean", obstime_yrs~factor(ldlcat), data = mri_rmna)
regress("odds", death~ldl, data = mri_rmna)
regress("odds", death~factor(ldlcat), data = mri_rmna)
regress("hazard", surv~ldl, data = mri[!is.na(mri$ldlcat), ])
regress("hazard", surv~factor(ldlcat), data = mri[!is.na(mri$ldlcat), ])

In contrast, all of the following return the same error message "Response variable and weights must be of same length".

regress("hazard", surv~ldl, data = mri)
regress("hazard", surv~ldlcat, data = mri)
regress("hazard", surv~factor(ldlcat), data = mri)

surv1 <- Surv(mri$obstime_yrs, mri$death)
regress("hazard", surv1~factor(ldlcat), data = mri)
adw96 commented 4 months ago

Huge thanks to @AmeerD for edits resulting in v1.0.6 that address this issue!

If anyone believes this was closed in error, feel free to reopen.