therneau / survival

Survival package for R
390 stars 106 forks source link

`partial match of 'std' to 'std.err'` #184

Closed bersbersbers closed 2 years ago

bersbersbers commented 2 years ago

I have not located the line in question, but I can reproduce this warning using

options(
  warnPartialMatchArgs = TRUE,
  warnPartialMatchAttr = TRUE,
  warnPartialMatchDollar = TRUE
)

followed by

survival::survfit(survival::Surv(Days, Status) ~ Group, data = df)
therneau commented 2 years ago

I have no idea what you are talking about. Some context please.

bersbersbers commented 2 years ago

Sorry. The problem is shown in this reprex:

C:\>Rscript --vanilla -e "options(warnPartialMatchDollar=T);survival::survfit(survival::Surv(1)~1)"
Call: survfit(formula = survival::Surv(1) ~ 1)

     n events median 0.95LCL 0.95UCL
[1,] 1      1      1      NA      NA
Warning messages:
1: partial match of 'std' to 'std.err'
2: partial match of 'std' to 'std.err'

I guess these warnings are not expected.

therneau commented 2 years ago

Partial matching after a $ sign has been legal S code since 1985 or so, and legal R code since R started. The survival package goes back to 1985 as well. You added an option to change this long standing default, and then you are surprised when my code actually used this feature? Why?

By the way using survival::Surv inside a survfit call is bad coding: within survfit you are in the survival name space, so it is unneeded. More importantly, there are cases where this breaks the language, e.g., use survival::strata in a coxph call is one example, and there are others.

bersbersbers commented 2 years ago

Partial matching after a $ sign has been legal S code since 1985 or so, and legal R code since R started. [...] you are surprised when my code actually used this feature? Why?

I am not surprised at all, as I have reported instances of this problem in a number of repositories (all are now fixed, by the way). But I do consider partial dollar name matching bad practice.

I am not going to argue if, generally, all legal code is good code. Personally, I have enabled this warning because partial dollar matching has once cost me several hours of debugging a silent problem that in any other language would have been a warning or, most likely, an error. (The example being a data frame with x and x_slightly_different columns (which are close, but not identical), and the data source at one point deciding to remove the x column. The code using df$x fell back to using x_slightly_different without telling me. And I am not alone with being surprised by that behavior: https://stackoverflow.com/q/48525972, https://stackoverflow.com/q/58513997)

In addition, in this example here, I would argue equating std and std.err (which to me are very different concepts - one being the standard deviation and the other the standard error), hurts readability of the code.

By the way using survival::Surv inside a survfit call is bad coding: within survfit you are in the survival name space, so it is unneeded.

Thanks, I'll try that. Do note that in my short reprex, Surv is required as survival::survfit(1~1) yields

Error in survfit.formula(1 ~ 1) : Response must be a survival object

therneau commented 2 years ago

Yes, partial matching in code is normally not optimal, I've been burned by it too. When adding multi-state capabilities to coxph, I added two elements "cmap" and "stratum_map" to the object; I purposefully did not use "stratamap", even though it was the obvious extension, knowing that confusion between it and "strata" woud occur.

For the code you "caught" the object in question is a short internal list, invisible outside the function. Nevertheless I will change it. I got a bit riled by the short and imperative messages, as I have too many other projects currently going on, and am behind on several.

In addition, in this example here, I would argue equating std and std.err (which to me are very different concepts - one being > the standard deviation and the other the standard error), hurts readability of the code. I would argue that the standard deviation of an estimate and the standard deviation of some data are exactly the same concept, deciding to give one of them a separate label is one of the less fortunate decisions made by our statistical ancestors. It has caused a lot of unnecessary confusion to have two labels for the same concept. We don't do this with 'mean' or 'variance' after all, and every does fine with it.

Yes, survfit(Surv(y) ~ 1) is legal, but survfit (y ~ 1) is not; this is on purpose.

And survival::survfit( Surv(y) ~ 1) is fine, though it quickly gets long winded (shall we also have base::mean(x)) , but survfit( survival::Surv(y) ~1) is making some unwarranted assumptions about formula processing. There are several pattern recognition steps that occur behind the scenes using the parse tree, before any function is called, and such substitutions can goof them up. One crazy example would be the user who has fully embraced the 'qualifiy everything' mantra and is surprised when lm (y ~ x1 base::'' x2 ,data = mine) does not give the same answer as x1x2. The strata() and tt() modifiers in a coxph call fall prey to this error as well.

bersbersbers commented 2 years ago

I got a bit riled by the short and imperative messages

Sorry for that! I was short on time, too, but that should not be an excuse.

Yes, survfit(Surv(y) ~ 1) is legal, but survfit (y ~ 1) is not; this is on purpose.

Ah, now I get you. You did not recommend getting rid of survival::Surv, but only survival::. Understood!