therneau / survival

Survival package for R
381 stars 104 forks source link

Feature request: support for non-syntactic names in `survfit` #232

Open mattsecrest opened 1 year ago

mattsecrest commented 1 year ago

I wonder if non-syntactic names can be supported consistently? It can be confusing that they work on LHS of Surv() formula but not RHS. The below example is for survival 3.5.7

library(survival)
library(tibble)

df <- tibble(
  os_months = abs(rnorm(100, 12, .5)),
  os_event = rbinom(100, 1, .5),
  `OS event non-syntactic` = os_event,
  group = sample(c("group 1", "group 2"), 100, replace = TRUE),
  `group non-syntactic` = group
)

# This works
survfit(
  Surv(os_months, os_event) ~ group,
  data = df
)

# This also works
survfit(
  Surv(os_months, `OS event non-syntactic`) ~ group,
  data = df
)

# This does not work
survfit(
  Surv(os_months, os_event) ~ `group non-syntactic`,
  data = df
)

Alternatively, a clearer message to the user when non-syntactic names are used could be helpful as well:

Error in `[.data.frame`(mf, ll) : undefined columns selected
therneau commented 11 months ago

I have very little sympathy for non-syntactic names, first of all. It's along the lines of my argument that "A_very_long_file_name_is_not_a_substitute_for_documentation". Second, and more importantly, I do all the formula processing via calls to the standard model.frame() function within R: if those fail I'm not about to fix it. Third, I have a lot of other things for survival in the queue, a couple are actual bugs (gives a wrong answer).

In this case a traceback shows that it is the strata() function which fails. Perhaps you would like to figure it out and submit a patch?