mclements / rstpm2

An R package for generalised survival models
28 stars 11 forks source link

Use of tvc and smooth.formula with factor variables #21

Closed mikesweeting closed 1 month ago

mikesweeting commented 2 years ago

Hi Mark, I think there may be a bug using stpm2 with factor variables. I get an error message when trying to change the degrees of freedom for a time-varying factor variable, but the error message goes if we use dummies instead. See example below

data(brcancer)
# Create a 3-level factor variable
brcancer$x4 <-  factor(brcancer$x4)

# This works... 
summary(stpm2(Surv(rectime,censrec==1)~x4,data=brcancer,df=3, tvc = list(x4=3)))
# but the following do not fit
summary(stpm2(Surv(rectime,censrec==1)~x4,data=brcancer,df=3, tvc = list(x4=2)))
summary(stpm2(Surv(rectime,censrec==1)~x4,data=brcancer,df=3, tvc = list(x4=2), control=list(robust_initial=T)))
summary(stpm2(Surv(rectime,censrec==1)~x4,data=brcancer,df=3, tvc = list(x4=1)))

# We can avoid the error using dummy variables instead 
library(fastDummies)
brcancer <- dummy_cols(brcancer, select_columns = "x4", remove_first_dummy = T)
summary(stpm2(Surv(rectime,censrec==1)~x4_2+x4_3,data=brcancer,df=3, tvc = list(x4_2=3, x4_3=3)))
summary(stpm2(Surv(rectime,censrec==1)~x4_2+x4_3,data=brcancer,df=3, tvc = list(x4_2=2, x4_3=2)))
mclements commented 1 month ago

Mike: my apologies for this very late reply.

There is a long-standing issue with using interactions between factor variables and splines in rstpm2 -- the design matrix is often not what you want. The simplest approach is to use indicator variables.

Sincerely, Mark.

mikesweeting commented 3 weeks ago

Thank you Mark for informing me about issues with using factor variables in rstpm2. I'll bear this in mind and will use dummy variables when needed.

Best wishes Mike