Closed RaphaelS1 closed 4 years ago
One final point on this:
As PredictionSurv
inherits from Prediction
, it is possible to overload the score
method to allow for two default measures, e.g.
score <- function(measures = NULL, task = NULL, learner = NULL) {
if(is.null(measures)) {
if(crank %in% self$predict_type)
measure <- msr("surv.harrellsc")
else
measure <- msr("surv.graf")
}
super$score(measures, task, learner)
}
As long as there are no problems from the core mlr team with this, then we can only include crank
if a ranking or lp
are explicitly returned, and leave it for composition otherwise.
I'd agree to the suggestion, with minor change:
If a model returns a clear ranking (e.g. survival.svm) then this is crank If a model does not return a ranking but does return an lp, then crank = lp If a model returns neither crank nor lp but distr then crank = distr$median() The user can overwrite these with PipeOpCrankCompositor
(predicted median survival seems more standard as a prognostic indicator than predicted mean survival)
On a side note, I'm unsure whether I agree with "There is no precedent for crank
".
By using the C-index, it tends to be more standard to evaluate survival models with respect to their ability to produce a concordant prognostic ranking, rather than with respect to their ability to predict the full conditional hazard.
This holds for common return types as well as for contemporary workflow practice.
If you meant, a precedent in terms of mlr3 architecture: that's probably true, because workflows, settings, and conventions for supervised learning are much less all over the place than for survival models (considering contemporary practice and state-of-art literature, being translated to mlr3).
Regarding overloading the scorer: I feel this is going to appear in another core mlr relevant task as well: unsupervised learning. Usually, you would prefer to evaluate these by the log-loss or another probabilistic loss, but that's only possible if the model is fully probabilistic. For models that aren't, you can do a few other things that tend to be model class specific or sub-case specific.
(predicted median survival seems more standard as a prognostic indicator than predicted mean survival)
Sure
On a side note, I'm unsure whether I agree with "There is no precedent for crank". By using the C-index, it tends to be more standard to evaluate survival models with respect to their ability to produce a concordant prognostic ranking, rather than with respect to their ability to predict the full conditional hazard.
In current R toolboxes, crank == lp
and is not separately defined. By no precedent I mean that this is not a used return type for implemented survival models and hence no definition in R. Which is why we now have the freedom to create one.
Regarding overloading the scorer: I feel this is going to appear in another core mlr relevant task as well: unsupervised learning.
Sure that makes sense. But then in that case maybe we should not be creating our own crank!
Although, problem with the median is that some empirical estimators (i.e. kaplans) may never reach S(t) = 0.5, so a mean may make more sense
in that case, isn't it just the (weighted) mean of the two adjacent observations? Every distribution has a unique median defined by this slightly hacky convention to make it unique.
I mean to say that the minimum value of S(t) (i.e. taken at the last point in the support) may be greater than 0.5
Yes, survival package in that case for example returns NA for median
Exactly, so mean
might be a better choice then right?
My suggested compromise is as follows:
1. If a model returns a clear ranking (e.g. `survival.svm`) then this is `crank` 2. If a model does not return a ranking but does return an `lp`, then `crank = lp` 3. If a model returns neither `crank` nor `lp` but `distr` then `crank = distr$mean()` 4. The user can overwrite these with `PipeOpCrankCompositor`
In this way, at a minimum
crank
will always be returned.
Sounds good to me.
As long as there are no problems from the core mlr team with this, then we can only include
crank
if a ranking orlp
are explicitly returned, and leave it for composition otherwise.
I'm wondering if the return type would be more clear if we would have a virtual return type "composed_crank"?
Although, problem with the median is that some empirical estimators (i.e. kaplans) may never reach S(t) = 0.5, so a mean may make more sense
I also don't know a definition for median survival if >50% of the observations are censored.
@mllg, what's a "virtual return type"? (reference?)
I also don't know a definition for median survival if >50% of the observations are censored.
The return would be median predicted survival, so censoring isn't an issue (here).
@mllg, what's a "virtual return type"? (reference?)
Basically more like a convention we could agree on: if the user sets the predict type to "crank" he either gets crank or an error. If he sets the predict type to "composed_crank" we try to construct something reasonable, like suggested by Raphael.
@mllg , I see - would this be a general suggestion?
Another more common situation where the issue occurs, btw, is in classification: often, the classification learners are natively probabilistic, e.g., GLM including logistic regression. You can make a deterministic one by thresholding probabilistic class predictions (these are pmf), but the threshold is arbitrary (thus, cleanly, a parameter of a reduction step).
If we were to translate your suggestion to existing handling of probabilistic vs deterministic classification in mlr3: would it clash or would it be concordant?
Basically more like a convention we could agree on: if the user sets the predict type to "crank" he either gets crank or an error. If he sets the predict type to "composed_crank" we try to construct something reasonable, like suggested by Raphael.
But as we are defining crank
surely this is implied anyway? Having multiple similar sounding return types might just add confusion, especially as we already have a compositor that they can use to change the results
The conversation offline earlier concluded that if
crank
is not returned by the learner natively then it should not be returned by the model. But here are three arguments against this:mlr3
(but someone who knows more, @mllg , correct me if wrong), a default measure has to be specified for a given task type but this does not depend on the predict type because in the other tasks there is always a guaranteed minimum return, i.e.regr
will returnresponse
orresponse
ANDse
,classif
will returnresponse
orresponse
ANDprob
, hence their defaults can always be used. But this is not the case inmlr3proba
unless we defaultcrank
when missing to be expectation of survival. If we do not do this then autotests crash (unless we add a conditional for thescore
which we can do)crank
. The 'standard' return types in survival models are: a) an estimate of survival time (often expectation of distribution), b) survival probabilities (distr), c) linear predictor (lp). Hence as we are creatingcrank
, there is also an argument we can define it however we wantsurvival.svm
returns something likecrank
that doesn't fit into thedistr
orlp
category, no other pure ranking methods, nor non-linear models, are implemented that don't return one ofdistr
orlp
, hence if we stuck to this strict principle then onlysurvival.svm
would have acrank
.My suggested compromise is as follows:
survival.svm
) then this iscrank
lp
, thencrank = lp
crank
norlp
butdistr
thencrank = distr$mean()
PipeOpCrankCompositor
In this way, at a minimum
crank
will always be returned.