Closed myriamziou closed 6 months ago
You are incorrect. The lm, glm, coxph, and many other routines use the core model.matrix routine to create the X matrix for regression. That routine tries it's best to predict if some columns of X will be redundant, which would create a singular X matrix, and then removes selected ones to avoid that. Which ones it chooses to remove are based on the contrasts.
It does pretty well for models with + and *, but often removes too few columns when the formula has a : (colon), which is what happened here. In that case an NA results for any column which is redundant with those to its left in the resulting X matrix.
When defining a step function for Beta(t) for a categorical variable using survSplit() prior to coxph() (as indicated in the section 4.1 of the "Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model" vignette, the reference group does not seem to respect what was defined in the levels of the factor variable. Here is an example with a simulated small dataset:
The summary of the model generates that output :
where the reference category seems to be y=3 instead of y=1 as per the levels of y. Thank you