hendrycks / outlier-exposure

Deep Anomaly Detection with Outlier Exposure (ICLR 2019)
Apache License 2.0
541 stars 107 forks source link

Cifar10/100 reproduce results on pytorch 1.4 #10

Closed wetliu closed 4 years ago

wetliu commented 4 years ago

Hello. If I use your pretrained oe_tune models, the results are perfectly matched. However, I have a hard time trying to reproduce the OE fine-tune results on the paper. Some OOD datasets are off, though the average is similar. I have tried both pytorch 1.4 and 0.4. Here are the OOD datasets that are off after OE tuning (left columns are your pretrained OE model while right columns are our reproduced results). It is on cifar100_wrn_oe_tune. Would you mind taking a look and sharing your ideas/package versions? Thank you so much!

Gaussian Noise   Gaussian Noise
FPR95 12.13   FPR95 2.71
AUROC 95.70   AUROC 99.25
AUPR 71.05   AUPR 94.48
         
Rademacher Noise   Rademacher Noise
FPR95 17.12   FPR95 0.1
AUROC 93.00   AUROC 99.97
AUPR 56.86   AUPR 99.8
         
Blob     Blob  
FPR95 12.07   FPR95 8.78
AUROC 97.15   AUROC 98.39
AUPR 86.16   AUPR 93.71
         
SVHN     SVHN  
FPR95 42.90   FPR95 51.13
AUROC 86.86   AUROC 83.88
AUPR 52.93   AUPR 49.18
         
Places365     Places365  
FPR95 49.76   FPR95 57.46
AUROC 86.50   AUROC 83.18
AUPR 57.92   AUPR 52.25
hendrycks commented 4 years ago

Detection performance on datasets such as Gaussian noise vary between different model initializations. Perhaps then the results in the paper did not use this specific model, but a model a model with the same architecture. Consequently the variability you observe is predictable, and goes to show that it is important to average across many distributions. Hopefully this answers your question.