Predict type: linear predictor, risk, or prognostic index

RaphaelS1 commented 5 years ago

These three terms are often conflated, both in documentation of survival packages, and in the theoretical papers behind these. In mlr(2) the term risk is used for all of these, which is a sensible option as (nearly) all survival models return relative risks, but these are not comparable. In the case of the Cox model, the PI is the linear predictor and these are equal to log(risk). However other models such as survivalsvm make a prediction about the prognostic index, without making any assumptions about the model form (e.g. PH, AFT, log-odds, etc.) and therefore returning this as the risk can be misleading if users always think that risk = exp(PI).

I think perhaps the best option is to keep using risk and just make it explicitly clear in the documentation pages that risk should not be compared between models (even when trained/predicted on same data) whereas distributions can be.

RaphaelS1 commented 5 years ago

This is closely related to #9 , I have currently done the following:

Include three predict types: risk , distr, lp (linear predictor)

These are defined as follows:

risk is the relative risk of event. This is relative to individuals in the same learner only and therefore cannot be compared between learners. However, it can be used for measures of discrimination and the results of these can be directly compared between learners. In some cases there is a clear definition for risk, when the linear predictor exists then risk = exp(lp), otherwise the approximation mean(CHF) can be used (which nearly always exists).
distr is a distr6 object, which as a minimum contains a pdf and cdf (and therefore all survival representations defined from these). These may be specific parametric distributions, weighted discrete distributions or fully custom distributions.
lp is another well-defined concept: lp = XB where X is the matrix of covariates and B is the vector of coefficients. This will not exist for all methods

How these relate:

If lp exists then risk = exp(lp)
Further if lp exists then by making assumptions about the baseline (PH/AFT/odds) then probabilities can be naturally derived
If lp does not exist but cdf does then risk = mean(-log(1-cdf)) for each individual

Composition documentation:

Deriving survival probabilities may require several assumptions and where possible this will be implemented as hyper-parameters. E.g. for parametric models, we include the parameter type with levels PH, AFT, odds, these determine the assumptions to derive the distribution.
In other cases there may not be a clear 'correct' way and then we will use the current gold-standard where sensible. For example we use pec for deriving survival probabilities from rpart and document the method used.
The most important take-away from this is that our implemented models are nearly all compositions of implemented methods. If these can be determined through hyper-parameters, this provides users with maximum flexibility. But in other cases these decisions have to be made internally and thus documentation should be absolutely precise about what the implemented model predicts and how mlr3proba composes this into a distribution

Finally, we note another conflated problem in current implementations of survival. In the simplest case if we take a baseline estimator of survival using survfit(Surv(...) ~ 1) then this has multiple returns including $surv and $chf, the naive implementation would be to include these both in our distr6 object but these are not equivalent after transformation as the former uses Kaplan and the latter uses Nelson, hence we add estimator as a hyper-parameter so users can choose which method they prefer

fkiraly commented 5 years ago

I've had some thoughts about the above. I think the main problem in leaving things as is is that the risk return type is ill-defined, inconsistent, and tied into a number implementation mistakes in some of the methods.

As long as it is ill-defined (what should that be?), I propose to remove.

It could later be replaced by the following, well-defined things:

(i) when prediction is interpreted as inducing a ranking, e.g., to put into C-index, the ranking induced is usually the same as is induced by the mean or median of the predictive distribution. That is simply deterministic regression, and "take mean" or "take median" are default and 2nd choice option for the reduction [deterministic regression] -> [probabilistic regression].

(ii) when the desired prediction is interpreted directly as ranking, the task is ranking, or more precisely, continuous ranking, which sits in the family of choice, preference and ranking tasks, subject to and evaluable by measures of ranking goodness. There are some standard ways for the reduction [survival modelling] -> [continuous ranking], or [probabilistic regression] -> [continuous ranking], which are implicitly the ones used in the various proportional hazards based models (multiply unconditional predictive baseline distribution with risk).

(iii) finally, perhaps the most exotic of all tasks would be "prediction of elicited statistic", which means, you try to predict some collection of non-standard/exotic statistics of the true predictive distribution. This could be some arcane formula based on hazard or cumulative hazard, such as currently some models use. I don't consider any of the more exotic formula preferable in the absence of an argument, so one should be able to select the elicited statistic via the interface.

No pun intended in the context of (ii), but my preference would be for (i) on the short-term, plus (ii) on long-term (once mlr-ranking or so exists). (iii) is perhaps too exotic to even consider, though it might be useful when functionality on multivariate or structured output prediction is added.

RaphaelS1 commented 5 years ago

I have had similar thoughts and agree with (i). I have run some basic simulations to empirically prove that for linear survival models, which are implemented with the native risk outputs and multiplied by a baseline hazard, that the ranking is indeed preserved when compared to the expectation of the survival distribution. Hence in the short-term I think we should return:

risk = Mean of the survival distribution with distr is returned. But in the case when distr is not returned (e.g. currently in SVM) then the native risk is returned with warning
distr = distr6 object representing survival distribution
lp = linear predictor for classical GLM survival models, well-understood and well-defined

fkiraly commented 5 years ago

that the ranking is indeed preserved when compared to the expectation of the survival distribution

well, in my opinion, this is an exact, mathematical fact (i.e., a provably correct statement), so any empirical simulation would have to confirm this, except in cases where there is a mistake in the code.

Regarding your suggestion: the problem with this is that "risk", as usually interpreted in the context of PH, is the multiplicative factor, i.e., the proportionality constant in a proportional hazard. While the latter makes sense only for models under the PH assumption, taking the expected survival as "risk" would disagree with that common terminology in the case where it applies (PH models).

As stated, the ranking induced will agree however.

In addition, the return type distr would disagree with usage in classification, or regression, where that would be proba, no?

RaphaelS1 commented 5 years ago

In addition, the return type distr would disagree with usage in classification, or regression, where that would be proba, no?

This is a deliberate choice. As discussed the classif task returns prob as a probability and not a distribution, hence in mlr3proba probabilistic regression will be referred to as the regr task with predict type distr to clearly show that a distribution is returned and not a probability as in classification.

As stated, the ranking induced will agree however.

Hm so we could just call the return rank instead of risk then. The new term may even be beneficial as it will prevent confusion between other return types

fkiraly commented 5 years ago

distr vs prob

So, would classif also get a return type distr at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?

rank vs risk

Well, it's slightly more complicated since it's a continuous rank, i.e., not an integer but any real number. It allows for comparison (bigger/smaller) but doesn't give an absolute positioning in comparison to a field of competitor instances. One could call it crank or contrank? Or something related to "prognostic index"?

In terms of a abstract math nonsense, the return type would be a specific subtype of partial order, a well-ordering across instances, encoded by a real number.

RaphaelS1 commented 5 years ago

So, would classif also get a return type distr at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?

This would be a decision that the core mlr team can make depending on how user-friendly mlr3proba turns out to be

One could call it crank or contrank? Or something related to "prognostic index"?

In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand rank (or risk for that matter, (or relrisk for relative))

fkiraly commented 5 years ago

In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand rank

Yes, but that's part of a problem: many people will have some arbitrary definition of "rank" in mind which is not what is returned, and across people it won't even be the same definition.

RaphaelS1 commented 5 years ago

What about relrisk for relative risk. Users will understand this as the term "relative risk" is used in the documentation of many survival functions and it makes implicit that the return is relative (and therefore equivalent to a continuous rank)

fkiraly commented 5 years ago

Relative risk is usually the fraction, or log-difference of the risk factors in a PH model. That's more specific than a number encoding a relative ranking, since it usually goes together with the implicit understanding that it approximates the relative probability of suffering the event in any given time interval, between groups (or a continuous transition) defined by the covariate.

RaphaelS1 commented 5 years ago

Ah I see okay, in which case I'm happy with crank. Assuming no objections from @mllg or @berndbischl ?

fkiraly commented 5 years ago

Hm, crank might be weird though, since it is a word: https://www.lexico.com/en/definition/crank but that might not matter too much?

mllg commented 5 years ago

Off the top of my head, what about "mortality"?

But crank is fine for me, too.

RaphaelS1 commented 5 years ago

I think mortality might indicate the predicted time until survival. I've gone with crank for now

fkiraly commented 5 years ago

yes, mortality has a different definition - it's a rate or probability of event (depending whether you talk about sample or population), and that event is always death.

mlr-org / mlr3proba

Predict type: linear predictor, risk, or prognostic index #12