I had an absolute value in the wrong place, and then realized that multiplying the "area difference" between the perfect calibration and a model's calibration by two makes it so this score lives from 0 to 1. Also, thought I'd give it a proper name: "Average Calibration Score" (inspired by Mean Average Precision since there are similar elements with aggregating over multiple thresholds)
I had an absolute value in the wrong place, and then realized that multiplying the "area difference" between the perfect calibration and a model's calibration by two makes it so this score lives from 0 to 1. Also, thought I'd give it a proper name: "Average Calibration Score" (inspired by Mean Average Precision since there are similar elements with aggregating over multiple thresholds)