soda-inria / hazardous

Competing Risks and Survival Analysis
https://soda-inria.github.io/hazardous/
MIT License
46 stars 11 forks source link

Update the computing risk incidence loss to remove a bias #17

Closed juAlberge closed 10 months ago

juAlberge commented 10 months ago

In the example there is still a bias, so there might be a bug, we need to investigate.

EDIT (Olivier): the bias goes away when increasing the number of trees (n_iter) with n_samples.

This PR is actually an estimator for the cause-specific cumulative density functions for each event type. It estimates the cumulative incidence if it were possible to remove the competing event types which is rarely the case in practice (e.g. remove all other causes of death to study cancer incidence). I don't think this is what we want to estimate in practice. See https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5557056/ for instance.

ogrisel commented 10 months ago

@juAlberge I just pushed 44df4cf to implement more directly the weighting scheme of your draft manuscript but the example is still completely off.

ogrisel commented 10 months ago

I broke some tests, let me fix this but I think this won't fix the real problem.

ogrisel commented 10 months ago

I pushed a commit to restore the implementation of IPCWEstimator from @juAlberge's original push. Note however that the example does not work as this estimator is no longer a competing risks cumulative incidence estimator (as AJ is for instance), but instead an estimator of the cumulative density function of each event distribution.

This is not the same when t goes to infinity. The former has a value that is strictly below 1.0 but the latter goes to 1.0 (assuming strictly positive hazards at infinity).

This means that this PR will not pass the new tests introduced in #18 as such.

I am not sure yet it the estimator introduced in this PR makes sense for practitioners.

ogrisel commented 10 months ago

I have merged main into this branch to trigger a doc build preview with the code of this branch. Here is the behavior of the marginal estimation example:

https://pull-request-17--hazardous-doc.netlify.app/auto_examples/plot_marginal_cumulative_incidence_estimation#sphx-glr-auto-examples-plot-marginal-cumulative-incidence-estimation-py

Here are the important plots (both without and with censoring):

image

image

As explained in the above analysis, the cause-specific CDFs estimated by this branch go to 1.0 (with some estimation noise) when t goes to infinity while the competing risks CIFs found by numerical integration of the true hazard functions or by the Aalen Johansen estimator go to respective fractions of each event type.

ogrisel commented 10 months ago

Looking at the plots again, I have the feeling that the estimated CDF might still be bad, for event 1 in particular, the non-concave shape seems wrong for a Weibull distribution with shape parameter below 1.

ogrisel commented 10 months ago

I think we can close this PR @juAlberge. I don't think we want to ever estimate the CDFs in hazardous.