mlr-org / mlr3proba

Probabilistic Learning for mlr3
https://mlr3proba.mlr-org.com/
GNU Lesser General Public License v3.0
129 stars 20 forks source link

Predict type: linear predictor, risk, or prognostic index #12

Closed RaphaelS1 closed 5 years ago

RaphaelS1 commented 5 years ago

These three terms are often conflated, both in documentation of survival packages, and in the theoretical papers behind these. In mlr(2) the term risk is used for all of these, which is a sensible option as (nearly) all survival models return relative risks, but these are not comparable. In the case of the Cox model, the PI is the linear predictor and these are equal to log(risk). However other models such as survivalsvm make a prediction about the prognostic index, without making any assumptions about the model form (e.g. PH, AFT, log-odds, etc.) and therefore returning this as the risk can be misleading if users always think that risk = exp(PI).

I think perhaps the best option is to keep using risk and just make it explicitly clear in the documentation pages that risk should not be compared between models (even when trained/predicted on same data) whereas distributions can be.

RaphaelS1 commented 5 years ago

This is closely related to #9 , I have currently done the following:

These are defined as follows:

How these relate:

Composition documentation:

Finally, we note another conflated problem in current implementations of survival. In the simplest case if we take a baseline estimator of survival using survfit(Surv(...) ~ 1) then this has multiple returns including $surv and $chf, the naive implementation would be to include these both in our distr6 object but these are not equivalent after transformation as the former uses Kaplan and the latter uses Nelson, hence we add estimator as a hyper-parameter so users can choose which method they prefer

fkiraly commented 5 years ago

I've had some thoughts about the above. I think the main problem in leaving things as is is that the risk return type is ill-defined, inconsistent, and tied into a number implementation mistakes in some of the methods.

As long as it is ill-defined (what should that be?), I propose to remove.

It could later be replaced by the following, well-defined things:

(i) when prediction is interpreted as inducing a ranking, e.g., to put into C-index, the ranking induced is usually the same as is induced by the mean or median of the predictive distribution. That is simply deterministic regression, and "take mean" or "take median" are default and 2nd choice option for the reduction [deterministic regression] -> [probabilistic regression].

(ii) when the desired prediction is interpreted directly as ranking, the task is ranking, or more precisely, continuous ranking, which sits in the family of choice, preference and ranking tasks, subject to and evaluable by measures of ranking goodness. There are some standard ways for the reduction [survival modelling] -> [continuous ranking], or [probabilistic regression] -> [continuous ranking], which are implicitly the ones used in the various proportional hazards based models (multiply unconditional predictive baseline distribution with risk).

(iii) finally, perhaps the most exotic of all tasks would be "prediction of elicited statistic", which means, you try to predict some collection of non-standard/exotic statistics of the true predictive distribution. This could be some arcane formula based on hazard or cumulative hazard, such as currently some models use. I don't consider any of the more exotic formula preferable in the absence of an argument, so one should be able to select the elicited statistic via the interface.

No pun intended in the context of (ii), but my preference would be for (i) on the short-term, plus (ii) on long-term (once mlr-ranking or so exists). (iii) is perhaps too exotic to even consider, though it might be useful when functionality on multivariate or structured output prediction is added.

RaphaelS1 commented 5 years ago

I have had similar thoughts and agree with (i). I have run some basic simulations to empirically prove that for linear survival models, which are implemented with the native risk outputs and multiplied by a baseline hazard, that the ranking is indeed preserved when compared to the expectation of the survival distribution. Hence in the short-term I think we should return:

fkiraly commented 5 years ago

that the ranking is indeed preserved when compared to the expectation of the survival distribution

well, in my opinion, this is an exact, mathematical fact (i.e., a provably correct statement), so any empirical simulation would have to confirm this, except in cases where there is a mistake in the code.

Regarding your suggestion: the problem with this is that "risk", as usually interpreted in the context of PH, is the multiplicative factor, i.e., the proportionality constant in a proportional hazard. While the latter makes sense only for models under the PH assumption, taking the expected survival as "risk" would disagree with that common terminology in the case where it applies (PH models).

As stated, the ranking induced will agree however.

In addition, the return type distr would disagree with usage in classification, or regression, where that would be proba, no?

RaphaelS1 commented 5 years ago

In addition, the return type distr would disagree with usage in classification, or regression, where that would be proba, no?

This is a deliberate choice. As discussed the classif task returns prob as a probability and not a distribution, hence in mlr3proba probabilistic regression will be referred to as the regr task with predict type distr to clearly show that a distribution is returned and not a probability as in classification.

As stated, the ranking induced will agree however.

Hm so we could just call the return rank instead of risk then. The new term may even be beneficial as it will prevent confusion between other return types

fkiraly commented 5 years ago

distr vs prob

So, would classif also get a return type distr at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?

rank vs risk

Well, it's slightly more complicated since it's a continuous rank, i.e., not an integer but any real number. It allows for comparison (bigger/smaller) but doesn't give an absolute positioning in comparison to a field of competitor instances. One could call it crank or contrank? Or something related to "prognostic index"?

In terms of a abstract math nonsense, the return type would be a specific subtype of partial order, a well-ordering across instances, encoded by a real number.

RaphaelS1 commented 5 years ago

So, would classif also get a return type distr at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?

This would be a decision that the core mlr team can make depending on how user-friendly mlr3proba turns out to be

One could call it crank or contrank? Or something related to "prognostic index"?

In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand rank (or risk for that matter, (or relrisk for relative))

fkiraly commented 5 years ago

In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand rank

Yes, but that's part of a problem: many people will have some arbitrary definition of "rank" in mind which is not what is returned, and across people it won't even be the same definition.

RaphaelS1 commented 5 years ago

What about relrisk for relative risk. Users will understand this as the term "relative risk" is used in the documentation of many survival functions and it makes implicit that the return is relative (and therefore equivalent to a continuous rank)

fkiraly commented 5 years ago

Relative risk is usually the fraction, or log-difference of the risk factors in a PH model. That's more specific than a number encoding a relative ranking, since it usually goes together with the implicit understanding that it approximates the relative probability of suffering the event in any given time interval, between groups (or a continuous transition) defined by the covariate.

RaphaelS1 commented 5 years ago

Ah I see okay, in which case I'm happy with crank. Assuming no objections from @mllg or @berndbischl ?

fkiraly commented 5 years ago

Hm, crank might be weird though, since it is a word: https://www.lexico.com/en/definition/crank but that might not matter too much?

mllg commented 5 years ago

Off the top of my head, what about "mortality"?

But crank is fine for me, too.

RaphaelS1 commented 5 years ago

I think mortality might indicate the predicted time until survival. I've gone with crank for now

fkiraly commented 5 years ago

yes, mortality has a different definition - it's a rate or probability of event (depending whether you talk about sample or population), and that event is always death.