Closed RaphaelS1 closed 5 years ago
This is closely related to #9 , I have currently done the following:
risk
, distr
, lp
(linear predictor)These are defined as follows:
risk
is the relative risk of event. This is relative to individuals in the same learner only and therefore cannot be compared between learners. However, it can be used for measures of discrimination and the results of these can be directly compared between learners. In some cases there is a clear definition for risk
, when the linear predictor exists then risk = exp(lp)
, otherwise the approximation mean(CHF)
can be used (which nearly always exists).distr
is a distr6
object, which as a minimum contains a pdf
and cdf
(and therefore all survival representations defined from these). These may be specific parametric distributions, weighted discrete distributions or fully custom distributions.lp
is another well-defined concept: lp = XB
where X is the matrix of covariates and B is the vector of coefficients. This will not exist for all methods How these relate:
lp
exists then risk = exp(lp)
lp
exists then by making assumptions about the baseline (PH/AFT/odds) then probabilities can be naturally derivedlp
does not exist but cdf
does then risk = mean(-log(1-cdf))
for each individualComposition documentation:
type
with levels PH, AFT, odds
, these determine the assumptions to derive the distribution.pec
for deriving survival probabilities from rpart
and document the method used.mlr3proba
composes this into a distributionFinally, we note another conflated problem in current implementations of survival. In the simplest case if we take a baseline estimator of survival using survfit(Surv(...) ~ 1)
then this has multiple returns including $surv
and $chf
, the naive implementation would be to include these both in our distr6
object but these are not equivalent after transformation as the former uses Kaplan and the latter uses Nelson, hence we add estimator
as a hyper-parameter so users can choose which method they prefer
I've had some thoughts about the above. I think the main problem in leaving things as is is that the risk
return type is ill-defined, inconsistent, and tied into a number implementation mistakes in some of the methods.
As long as it is ill-defined (what should that be?), I propose to remove.
It could later be replaced by the following, well-defined things:
(i) when prediction is interpreted as inducing a ranking, e.g., to put into C-index, the ranking induced is usually the same as is induced by the mean or median of the predictive distribution. That is simply deterministic regression, and "take mean" or "take median" are default and 2nd choice option for the reduction [deterministic regression] -> [probabilistic regression].
(ii) when the desired prediction is interpreted directly as ranking, the task is ranking, or more precisely, continuous ranking, which sits in the family of choice, preference and ranking tasks, subject to and evaluable by measures of ranking goodness. There are some standard ways for the reduction [survival modelling] -> [continuous ranking], or [probabilistic regression] -> [continuous ranking], which are implicitly the ones used in the various proportional hazards based models (multiply unconditional predictive baseline distribution with risk).
(iii) finally, perhaps the most exotic of all tasks would be "prediction of elicited statistic", which means, you try to predict some collection of non-standard/exotic statistics of the true predictive distribution. This could be some arcane formula based on hazard or cumulative hazard, such as currently some models use. I don't consider any of the more exotic formula preferable in the absence of an argument, so one should be able to select the elicited statistic via the interface.
No pun intended in the context of (ii), but my preference would be for (i) on the short-term, plus (ii) on long-term (once mlr-ranking or so exists). (iii) is perhaps too exotic to even consider, though it might be useful when functionality on multivariate or structured output prediction is added.
I have had similar thoughts and agree with (i). I have run some basic simulations to empirically prove that for linear survival models, which are implemented with the native risk outputs and multiplied by a baseline hazard, that the ranking is indeed preserved when compared to the expectation of the survival distribution. Hence in the short-term I think we should return:
risk
= Mean of the survival distribution with distr
is returned. But in the case when distr
is not returned (e.g. currently in SVM) then the native risk
is returned with warningdistr
= distr6 object representing survival distributionlp
= linear predictor for classical GLM survival models, well-understood and well-definedthat the ranking is indeed preserved when compared to the expectation of the survival distribution
well, in my opinion, this is an exact, mathematical fact (i.e., a provably correct statement), so any empirical simulation would have to confirm this, except in cases where there is a mistake in the code.
Regarding your suggestion: the problem with this is that "risk", as usually interpreted in the context of PH, is the multiplicative factor, i.e., the proportionality constant in a proportional hazard. While the latter makes sense only for models under the PH assumption, taking the expected survival as "risk" would disagree with that common terminology in the case where it applies (PH models).
As stated, the ranking induced will agree however.
In addition, the return type distr
would disagree with usage in classification, or regression, where that would be proba
, no?
In addition, the return type distr would disagree with usage in classification, or regression, where that would be proba, no?
This is a deliberate choice. As discussed the classif
task returns prob
as a probability and not a distribution, hence in mlr3proba probabilistic regression will be referred to as the regr
task with predict type distr
to clearly show that a distribution is returned and not a probability as in classification.
As stated, the ranking induced will agree however.
Hm so we could just call the return rank
instead of risk
then. The new term may even be beneficial as it will prevent confusion between other return types
distr vs prob
So, would classif
also get a return type distr
at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?
rank vs risk
Well, it's slightly more complicated since it's a continuous rank, i.e., not an integer but any real number. It allows for comparison (bigger/smaller) but doesn't give an absolute positioning in comparison to a field of competitor instances.
One could call it crank
or contrank
?
Or something related to "prognostic index"?
In terms of a abstract math nonsense, the return type would be a specific subtype of partial order, a well-ordering across instances, encoded by a real number.
So, would classif also get a return type distr at some point, which gives back a discrete distribution object rather than that strange matrix of probabilities?
This would be a decision that the core mlr team can make depending on how user-friendly mlr3proba turns out to be
One could call it crank or contrank? Or something related to "prognostic index"?
In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand rank
(or risk
for that matter, (or relrisk
for relative))
In theory I agree that these sound sensible but I think the majority of users who are familiar with survival implementations will intuitively understand
rank
Yes, but that's part of a problem: many people will have some arbitrary definition of "rank" in mind which is not what is returned, and across people it won't even be the same definition.
What about relrisk
for relative risk. Users will understand this as the term "relative risk" is used in the documentation of many survival functions and it makes implicit that the return is relative (and therefore equivalent to a continuous rank)
Relative risk is usually the fraction, or log-difference of the risk factors in a PH model. That's more specific than a number encoding a relative ranking, since it usually goes together with the implicit understanding that it approximates the relative probability of suffering the event in any given time interval, between groups (or a continuous transition) defined by the covariate.
Ah I see okay, in which case I'm happy with crank
.
Assuming no objections from @mllg or @berndbischl ?
Hm, crank
might be weird though, since it is a word:
https://www.lexico.com/en/definition/crank
but that might not matter too much?
Off the top of my head, what about "mortality"?
But crank
is fine for me, too.
I think mortality
might indicate the predicted time until survival. I've gone with crank
for now
yes, mortality has a different definition - it's a rate or probability of event (depending whether you talk about sample or population), and that event is always death.
These three terms are often conflated, both in documentation of survival packages, and in the theoretical papers behind these. In mlr(2) the term
risk
is used for all of these, which is a sensible option as (nearly) all survival models return relative risks, but these are not comparable. In the case of the Cox model, the PI is the linear predictor and these are equal to log(risk). However other models such assurvivalsvm
make a prediction about the prognostic index, without making any assumptions about the model form (e.g. PH, AFT, log-odds, etc.) and therefore returning this as therisk
can be misleading if users always think thatrisk = exp(PI)
.I think perhaps the best option is to keep using
risk
and just make it explicitly clear in the documentation pages thatrisk
should not be compared between models (even when trained/predicted on same data) whereas distributions can be.