Reproducing the test results

DwD0123 commented 1 year ago

Hello, we are reproducing the test results for CIFAR10 in your paper but encountered some problems. We used the codes in this repository, setting the same parameters (T=1000, noise=0.0014 for Odin, T=0.1 for Energy), but the results are not exactly the same as in the paper. The average values for CE loss / LogitNorm loss + MSP / Odin are the same, but there are some differences for each OoD dataset if we compare them separately. For CE loss + Energy score, we got 35.42% FPR95 in average, which is higher than 26.82% in the paper. For LogitNorm loss + Energy score (T=0.1), we got wrong results (FPR95 > 90%) using the weights provided in this repository. We got the valid results using the training scripts to train the WRN-40-2 model with LogitNorm loss again. Yet they are different than those in the paper. I put our testing results in a table here in comparison with the testing results from your paper:

It seems that the testing results are influenced by model weights. Could you provide the model weights trained with CE loss, so that we can reproduce your testing results of CE loss? Could you also verify again that the weights in this repository works with energy score function (T=0.1)? Thank you!

rubbish001 commented 8 months ago

这个损失函数能work吗，感觉有点扯淡

hongxin001 commented 8 months ago

I think this result is reasonable. We can see that the performance is also worse than the reported resuls when using energy score training with CE. The important thing is that our method indeed improve the performance of ood detection.

rubbish001 commented 8 months ago

I think this result is reasonable. We can see that the performance is also worse than the reported resuls when using energy score training with CE. The important thing is that our method indeed improve the performance of ood detection.

真的能work吗，那意思是对假阳特征进行norm话能降低假阳的置信度吗

hongxin001 / logitnorm_ood

Reproducing the test results #6