hendrycks / ss-ood

Self-Supervised Learning for OOD Detection (NeurIPS 2019)
MIT License
264 stars 31 forks source link

Largely deviating AUROC scores in self-implemented MSP baseline (Multi-Class OoD Detection) #13

Closed kuehn-ma closed 4 years ago

kuehn-ma commented 4 years ago

Hi, really interesting publication! I tried to reproduce the results of your Multi-Class OoD Detector with rotation head compared to the vanilla MSP baseline. The AUROC scores of the rotation network were quite similiar in my self-trained implementation: Gaussian OoD 99.38%, Cifar-100 OoD 90.65%.

My issue is now with the vanilla MSP baseline, because I get a very large deviation in AUROC scores to your baseline of more than 30% (Gaussian OoD: 65.41%, Cifar-100 OoD: 52.38%).

Now I am trying to figure out what the issue with my implementation is and would like to ask you to provide some more details about the (training) setup of your vanilla MSP baseline. Basically how exactly is the model architecture, what training data (incl. perturbations) and loss function do you use and what hyperparameters did you have?

Best Regards and already thank you in advance! Marc Alexander

hendrycks commented 4 years ago

I recall there being large deviations in multi-class settings, especially for noise, so I doubt the issue is the hyperparameters. I have many weights available in https://github.com/hendrycks/outlier-exposure