hirsch-lab / roc-utils

Python package to compute and visualize a ROC analysis

MIT License

28 stars 8 forks source link

Option: different confidence interval #2

Open Wieser145 opened 1 year ago

Wieser145 commented 1 year ago

Until now you can not specify a different confidence Interval for the mean ROC curve, 95% is fixed, in future it would be great if there is an option to change it!

In the function plot_mean_roc():

if show_ci:

95% confidence interval

    tpr_std = np.std(ret_mean.tpr_all, axis=0, ddof=1)
    tpr_lower = ret_mean.tpr - 1.96 * tpr_std / np.sqrt(n_samples)
    tpr_upper = ret_mean.tpr + 1.96 * tpr_std / np.sqrt(n_samples)
    label_ci = "95% CI of mean curve" if show_details else None
    ax.fill_between(ret_mean.fpr, tpr_lower, tpr_upper,
                    color=color, alpha=.3,
                    label=label_ci,
                    zorder=zorder)

normanius commented 1 year ago

Good point! This definitely should be a function parameter to plot_mean_roc().

Wieser145 commented 1 year ago

If you didn't have done it, just add the parameter level which could be for example 0.9 and then change: tpr_lower = ret_mean.tpr - 1.96 tpr_std / np.sqrt(n_samples) into: tpr_lower = ret_mean.tpr - (norm.ppf(1 - (1 - level) / 2) tpr_std / np.sqrt(n_samples))

But what I am not sure is, if this calculation of the confidence interval is right because in R with the rocit.ciRoc they are using a totally different approach for calculating the standard deviation of the true positive rates, I am not sure if we do bootstrapping, that we can expect them to be normally distributed. Rather they have an empirical (nearly binomial dist.)

In R they are using the following code:

var_term1 <- TPR (1-TPR)/pos_count # looks like sqrt(pq/n), but I am not sure what is meant with pos_count SE_TPR <- sqrt(var_term1 ) multiplier <- qnorm((1+level)/2) upper <- TPR + multiplier * SE_TPR

Any idea?

normanius commented 1 year ago

Thanks for investigating this. I need to sit down and check this myself. Unfortunately, I'm quite busy right now. I may find time next week or the week after.

I definitely made very lose assumptions for the bootstrapping approach. It can be that I was overoptimistic. I certainly used guidance in some reference implementation. But it's been a while to remember exactly which reference I used when I was working on this. And it's possible that the reference was using bootstrapping in a different context other than ROC analysis.

Happy to receive more suggestions, especially if you find that I implemented things completely wrong :)

Wieser145 commented 1 year ago

Thank you and I am happy if I can help optimizing it