Largely deviating AUROC scores in self-implemented MSP baseline (Multi-Class OoD Detection)

Hi, really interesting publication! I tried to reproduce the results of your Multi-Class OoD Detector with rotation head compared to the vanilla MSP baseline. The AUROC scores of the rotation network were quite similiar in my self-trained implementation: Gaussian OoD 99.38%, Cifar-100 OoD 90.65%.

My issue is now with the vanilla MSP baseline, because I get a very large deviation in AUROC scores to your baseline of more than 30% (Gaussian OoD: 65.41%, Cifar-100 OoD: 52.38%).

Now I am trying to figure out what the issue with my implementation is and would like to ask you to provide some more details about the (training) setup of your vanilla MSP baseline. Basically how exactly is the model architecture, what training data (incl. perturbations) and loss function do you use and what hyperparameters did you have?

Best Regards and already thank you in advance! Marc Alexander

hendrycks / ss-ood

Largely deviating AUROC scores in self-implemented MSP baseline (Multi-Class OoD Detection) #13