Closed topepo closed 3 years ago
Why would it? Survfit returns a survival curve, not a per-subject value.
From: Max Kuhn @.> Reply-To: therneau/survival @.> Date: Tuesday, March 9, 2021 at 9:02 PM To: therneau/survival @.> Cc: Subscribed @.> Subject: [EXTERNAL] [therneau/survival] survfit.coxph and missing values in newdata (#137)
It looks like na.exclude() doesn't pad the results of survfit.coxph() with NA values (in the resulting matrix). This happens with or without strata.
library(survival)
mod <- coxph(Surv(time, status) ~ age + ph.ecog, data = lung,
na.action = na.exclude)
new_x <- lung[1:15, c("ph.ecog", "age")]
length(predict(mod, new_x, na.action = na.exclude))
surv_estimates <- survfit(mod, newdata = new_x,
na.action = na.exclude)
dim(surv_estimates$surv)
Created on 2021-03-09 by the reprex packagehttps://reprex.tidyverse.org (v1.0.0.9000)
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/therneau/survival/issues/137, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ACJ3PGP73J6PUXT5XIXVA7TTC3HILANCNFSM4Y5AAJRQ.
It's very difficult to pad the values so that the expected dimensions are correct (regardless of the output type/format).
I think that it is reasonable to expect results (of any kind) for 15 samples when that na.action
is used. At the minimum the current behavior is unexpected given what na.exclude
does (in spirit or literally).
Would you be open to a PR that implements this behavior?
I still have no idea what you are talking about. For operations that return a value per subject, then na.action plays a role. But that is not what a survival curve is. You need to create a concrete example of what you desire to do.
In the example above, I'd like na.exclude()
to pad the results with missing values as it does in the other situations. In other words, the results for surv_estimates
would be 185x15 with a column of NA values for the row of newdata
that had the missing value.
When the result of a prediction is a double, we can pad it with NA since R has an appropriate missing value for double. You can tuck it into the vector of doubles and it is still a vector of doubles. Every downstream function that accepts doubles needs to be aware of this and deal with it.
There is no "missing" of type survfit. There never has been one. When you call coxph with a newdata argument it returns a set of survival curves. To add an NA to that list involves creating a new NA type, and much more to the point, updating every routine in my package that accepts a survival curve as input, and other packages, to properly deal with this new feature. Example: ggsurvplot
What is the compeling use case that would justify months of work? You would need to start by finding all the downstream routines that would be affected, map out what they should do, then whether they now need an na.omit option. document it, and test it. The use case would need to be very strong.
It looks like
na.exclude()
doesn't pad the results ofsurvfit.coxph()
with NA values (in the resulting matrix). This happens with or without strata.Created on 2021-03-09 by the reprex package (v1.0.0.9000)