p-lambda / verified_calibration

Calibration library and code for the paper: Verified Uncertainty Calibration. Ananya Kumar, Percy Liang, Tengyu Ma. NeurIPS 2019 (Spotlight).
MIT License
142 stars 20 forks source link

Question: Reliability diagrams #3

Closed mpitropov closed 3 years ago

mpitropov commented 3 years ago

Within the Verified Uncertainty Calibration paper I noticed there are no reliability diagrams. Would it be possible to return the bin accuracies in order for users to create the reliability diagram? If the bin sizes are different, I think that would also have to be returned.

For example in my current codebase to create a reliability diagram for ECE I have the following:

I have these variables

acc = [0, 0, 0.00167434, 0.00271739, 0.007, 0.00495663, 0.00269906, 0.01893491, 0.04973357, 0.65488513]
ece = 0.41769305566809833

Send it to my plotting function

# Plot Reliability Diagram
def plot_reliability(acc, ece, save_path):
    interval = 1 / len(acc)
    x = np.arange(interval/2, 1+interval/2, 1/len(acc))

    plt.figure(figsize=(3,3))
    plt.bar(x, acc, width=0.08, edgecolor='k')
    plt.xlabel('Confidence')
    plt.ylabel('Accuracy')
    plt.xlim([0,1])
    plt.ylim([0,1])
    plt.text(0,1.01,'ECE={}'.format(str(ece)[:5]))

    plt.plot([0,1], [0,1], 'k--')
    plt.tight_layout()
    plt.savefig(save_path)
    plt.show()

To create a diagram like this: image

edwardchaos commented 3 years ago

great idea

AnanyaKumar commented 3 years ago

Sorry for the super late response. This is a great question!

Here's how you can do it using functions in utils.py. First, get the bins by calling get_equal_bins or get_equal_prob_bins (standard reliability diagrams typically use get_equal_prob_bins). Then we have a function bin, which you can call with bin(data, bins) which bins the data. The value returned is of type BinnedData, which is a list of length num_bins, where each element is a list of (prob, label) pairs in that bin. Averaging over the bin, you get the probability and the average accuracy in that bin, which is what you want in a reliability diagram.

We could also create a wrapper script for this. If you want to submit a PR for it, with a couple of simple of unit tests to check that it produces the right thing, that'd be amazing! Let me know if you have any other questions.

AnanyaKumar commented 3 years ago

Closing for now, but feel free to reopen if you have follow-up questions!