mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com
GNU Lesser General Public License v3.0
126 stars 20 forks source link

Are we *sure* crank shouldn't always be returned? #42

Closed RaphaelS1 closed 4 years ago

RaphaelS1 commented 4 years ago

The conversation offline earlier concluded that if crank is not returned by the learner natively then it should not be returned by the model. But here are three arguments against this:

  1. In the current set-up of mlr3 (but someone who knows more, @mllg , correct me if wrong), a default measure has to be specified for a given task type but this does not depend on the predict type because in the other tasks there is always a guaranteed minimum return, i.e. regr will return response or response AND se, classif will return response or response AND prob, hence their defaults can always be used. But this is not the case in mlr3proba unless we default crank when missing to be expectation of survival. If we do not do this then autotests crash (unless we add a conditional for the score which we can do)
  2. There is no precedent for crank. The 'standard' return types in survival models are: a) an estimate of survival time (often expectation of distribution), b) survival probabilities (distr), c) linear predictor (lp). Hence as we are creating crank, there is also an argument we can define it however we want
  3. Only survival.svm returns something like crank that doesn't fit into the distr or lp category, no other pure ranking methods, nor non-linear models, are implemented that don't return one of distr or lp, hence if we stuck to this strict principle then only survival.svm would have a crank.

My suggested compromise is as follows:

  1. If a model returns a clear ranking (e.g. survival.svm) then this is crank
  2. If a model does not return a ranking but does return an lp, then crank = lp
  3. If a model returns neither crank nor lp but distr then crank = distr$mean()
  4. The user can overwrite these with PipeOpCrankCompositor

In this way, at a minimum crank will always be returned.

RaphaelS1 commented 4 years ago

One final point on this:

As PredictionSurv inherits from Prediction, it is possible to overload the score method to allow for two default measures, e.g.

score <- function(measures = NULL, task = NULL, learner = NULL) {
  if(is.null(measures)) {
    if(crank %in% self$predict_type)
      measure <- msr("surv.harrellsc")
    else
      measure <- msr("surv.graf")
  }
  super$score(measures, task, learner)
}

As long as there are no problems from the core mlr team with this, then we can only include crank if a ranking or lp are explicitly returned, and leave it for composition otherwise.

fkiraly commented 4 years ago

I'd agree to the suggestion, with minor change:

If a model returns a clear ranking (e.g. survival.svm) then this is crank If a model does not return a ranking but does return an lp, then crank = lp If a model returns neither crank nor lp but distr then crank = distr$median() The user can overwrite these with PipeOpCrankCompositor

(predicted median survival seems more standard as a prognostic indicator than predicted mean survival)

fkiraly commented 4 years ago

On a side note, I'm unsure whether I agree with "There is no precedent for crank". By using the C-index, it tends to be more standard to evaluate survival models with respect to their ability to produce a concordant prognostic ranking, rather than with respect to their ability to predict the full conditional hazard. This holds for common return types as well as for contemporary workflow practice.

If you meant, a precedent in terms of mlr3 architecture: that's probably true, because workflows, settings, and conventions for supervised learning are much less all over the place than for survival models (considering contemporary practice and state-of-art literature, being translated to mlr3).

fkiraly commented 4 years ago

Regarding overloading the scorer: I feel this is going to appear in another core mlr relevant task as well: unsupervised learning. Usually, you would prefer to evaluate these by the log-loss or another probabilistic loss, but that's only possible if the model is fully probabilistic. For models that aren't, you can do a few other things that tend to be model class specific or sub-case specific.

RaphaelS1 commented 4 years ago

(predicted median survival seems more standard as a prognostic indicator than predicted mean survival)

Sure

On a side note, I'm unsure whether I agree with "There is no precedent for crank". By using the C-index, it tends to be more standard to evaluate survival models with respect to their ability to produce a concordant prognostic ranking, rather than with respect to their ability to predict the full conditional hazard.

In current R toolboxes, crank == lp and is not separately defined. By no precedent I mean that this is not a used return type for implemented survival models and hence no definition in R. Which is why we now have the freedom to create one.

Regarding overloading the scorer: I feel this is going to appear in another core mlr relevant task as well: unsupervised learning.

Sure that makes sense. But then in that case maybe we should not be creating our own crank!

RaphaelS1 commented 4 years ago

Although, problem with the median is that some empirical estimators (i.e. kaplans) may never reach S(t) = 0.5, so a mean may make more sense

fkiraly commented 4 years ago

in that case, isn't it just the (weighted) mean of the two adjacent observations? Every distribution has a unique median defined by this slightly hacky convention to make it unique.

RaphaelS1 commented 4 years ago

I mean to say that the minimum value of S(t) (i.e. taken at the last point in the support) may be greater than 0.5

adibender commented 4 years ago

Yes, survival package in that case for example returns NA for median

RaphaelS1 commented 4 years ago

Exactly, so mean might be a better choice then right?

mllg commented 4 years ago

My suggested compromise is as follows:

1. If a model returns a clear ranking (e.g. `survival.svm`) then this is `crank`

2. If a model does not return a ranking but does return an `lp`, then `crank = lp`

3. If a model returns neither `crank` nor `lp` but `distr` then `crank = distr$mean()`

4. The user can overwrite these with `PipeOpCrankCompositor`

In this way, at a minimum crank will always be returned.

Sounds good to me.

As long as there are no problems from the core mlr team with this, then we can only include crank if a ranking or lp are explicitly returned, and leave it for composition otherwise.

I'm wondering if the return type would be more clear if we would have a virtual return type "composed_crank"?

Although, problem with the median is that some empirical estimators (i.e. kaplans) may never reach S(t) = 0.5, so a mean may make more sense

I also don't know a definition for median survival if >50% of the observations are censored.

fkiraly commented 4 years ago

@mllg, what's a "virtual return type"? (reference?)

I also don't know a definition for median survival if >50% of the observations are censored.

The return would be median predicted survival, so censoring isn't an issue (here).

mllg commented 4 years ago

@mllg, what's a "virtual return type"? (reference?)

Basically more like a convention we could agree on: if the user sets the predict type to "crank" he either gets crank or an error. If he sets the predict type to "composed_crank" we try to construct something reasonable, like suggested by Raphael.

fkiraly commented 4 years ago

@mllg , I see - would this be a general suggestion?

Another more common situation where the issue occurs, btw, is in classification: often, the classification learners are natively probabilistic, e.g., GLM including logistic regression. You can make a deterministic one by thresholding probabilistic class predictions (these are pmf), but the threshold is arbitrary (thus, cleanly, a parameter of a reduction step).

If we were to translate your suggestion to existing handling of probabilistic vs deterministic classification in mlr3: would it clash or would it be concordant?

RaphaelS1 commented 4 years ago

Basically more like a convention we could agree on: if the user sets the predict type to "crank" he either gets crank or an error. If he sets the predict type to "composed_crank" we try to construct something reasonable, like suggested by Raphael.

But as we are defining crank surely this is implied anyway? Having multiple similar sounding return types might just add confusion, especially as we already have a compositor that they can use to change the results