Running fit.mult.impute() on a cph model where (at least) one of the variables is a boolean/logical variable (FALSE/TRUE) results in an error message. Here’s a reprex:
library(rms)
n = 100
d = data.frame(
time = rexp(n),
status = rbinom(n, 1, .7),
age = rnorm(50, 10),
male = sample(c(FALSE, TRUE, NA), n, replace = TRUE)
)
# Fitting the model works fine
l = cph(Surv(time, status) ~ age + male, data=d)
coef(l)
#> age male
#> 0.24057808 0.08193785
# Imputing the model works fine
imp = aregImpute(~time+status+age+male, data=d)
# But fitting the model on the *imputed* data results in an error
l_imp = fit.mult.impute(formula(l), cph, imp, data = d)
#> Error in X[, mmcolnames, drop = FALSE]: subscript out of bounds
The bug occurs is the line X <- X[, mmcolnames, drop = FALSE] in cph(). For this example, the column names of X when that line is run are c("(Intercept)", "age", "maleTRUE") while the mmcolnames variable contains c("age", "male"), i.e. male instead of maleTRUE.
If one converts the male variable to a factor before running the imputation and model fitting, everything works fine:
# If the logical value is converted to a factor,
# everything works fine
d$male = factor(d$male) # or as.numeric(d$male)
imp = aregImpute(~time+status+age+male, data=d)
l_imp = fit.mult.impute(formula(l), cph, imp, data = d)
#>
#> Variance Inflation Factors Due to Imputation:
#>
#> age male=TRUE
#> 1.00 1.15
#>
#> Rate of Missing Information:
#>
#> age male=TRUE
#> 0.00 0.13
#>
#> d.f. for t-distribution for Tests of Single Coefficients:
#>
#> age male=TRUE
#> 1.491268e+09 2.376500e+02
#>
#> The following fit components were averaged over the 5 model fits:
#>
#> linear.predictors means stats center
coef(l_imp)
#> age male=TRUE
#> 0.14837184 -0.01095707
But since cph() works fine with logical variables, I think fit.mult.impute() with a cph() fitter should work fine too.
Running
fit.mult.impute()
on acph
model where (at least) one of the variables is a boolean/logical variable (FALSE
/TRUE
) results in an error message. Here’s a reprex:The bug occurs is the line
X <- X[, mmcolnames, drop = FALSE]
incph()
. For this example, the column names ofX
when that line is run arec("(Intercept)", "age", "maleTRUE")
while themmcolnames
variable containsc("age", "male")
, i.e.male
instead ofmaleTRUE
.If one converts the
male
variable to a factor before running the imputation and model fitting, everything works fine:But since
cph()
works fine with logical variables, I thinkfit.mult.impute()
with acph()
fitter should work fine too.