Closed haruishi43 closed 3 weeks ago
Hello @haruishi43 ,
I just checked the paper again and thanks for pointing it out.The statement in IV-B about utilizing the maximum probability as confidence for TS in calculating is made for a fair comparison, as TS aims to calibrate the maximum probability rather than the uncertainty. Thus, we use the maximum probability as confidence for TS in calculating uECE part of the pECE. (results in Table 1).
What we wanted to showcase in table 2 is the correlation between 3 different metrics (ECE, uECE, pECE). Now we missed to mention that here for uECE values, we used entropy for uncertainty quantification in case of non-evidential baselines (including TS). We did so for quantitatively evaluate both the uncertainty and the probability predictions. We can see from the table 2 that ECE metric for TS is better than the uECE, verifying that TS has better probability estimate than uncertainty estimate. Thus it is fair to use probability for the TS than entropy for calculating pECE. Same is true for table 4. Thank you for pointing it out, we will try to add the missing information in an update arxiv version.
Regarding the release of metrics, there is some work in progress in background which might setup a universal and proper evaluation. But I can not give exact date for that.
However, you can ask about any question you get regarding the metric implementation. Our metrics are directly based on the general ECE metric implementation.
@kshitij3112 thanks for the quick reply. The part where TS using entropy makes sense for Table 2.
Thank you for your work.
In the paper, it is said that TS methods use "maximum probability" as the confidence. Since ECE metric already uses maximum probability as their confidence as well, I would think that ECE and uECE should have the same score for TS methods. However, I noticed in table 2 that uECE seems to be different for TS methods. Can you explain (or show) how the uECE metric is calculated for methods using TS? It would be nice if you can release the ECE, uECE, and pECE metric to understand your paper better.