Closed giangnguyen2412 closed 2 years ago
you are using the model's confidence scores on two the same imagenet-o datasets
We are not. Notice this line: https://github.com/hendrycks/natural-adv-examples/blob/07770705658c3a1c8acce31fd9dbd68f06e297c3/eval_many_models.py#L58
We are comparing ImageNet-O examples to ImageNet val examples.
Hi @hendrycks ,
I got this when I run my code.
FPR95: 100.00
AUROC: 50.97
AUPR: 57.36
What does this imply? False positive rate at recall 95% is 100%. It sounds weird to me. Is this FPR95 value meaningful or by using ImageNet-O from your paper, we just care about AUPR as you reported in Figure 2?
We could care about FPR95 or AUROC, but for simplicity we just showed one of the metrics. AUPR and AUROC are more common than FPR95. The model might just have a very hard time detecting these images, hence the low performance.
Greetings,
I am having a similar issue. I am testing imagenet-o metrics and I get the following values for a ResNet50: FPR95: 80.83 AUROC: 41.78 AUPR: 61.34
The value of AUPR pointed in the paper (table 1 supplementary) is of 16.20%. I have tested the code multiple times with minimal changes (added stable_cumsum / and adjusted the paths L22-24).
Do you know why is this happening? Are you sure that the code provides the correct results for a ResNet50? My guess is it could be due to something happening when creating symlinks to imagenet (lines 54-60).
Thanks! Stefano
Hello @hendrycks ,
In your code, you are using the model's confidence scores on two the same imagenet-o datasets. Can you explain why you do this? How do you compute the AUPR95 by two lists of confidence scores. I am trying to improve AUPR95 from your paper but can not grasp how do you get the AUPR here. I did a quick check with two lists of 2000 random floats and the result is given below. What should I expect when I run my program to improve OOD performance.
Output:
Thank you a lot!