max-andr / relu_networks_overconfident

Why ReLU networks yield high-confidence predictions far away from the training data and how to mitigate the problem [CVPR 2019, oral]
https://arxiv.org/abs/1812.05720
181 stars 21 forks source link

Question about loss #3

Closed eric-tc closed 4 years ago

eric-tc commented 4 years ago

Hi, thanks for sharing your work. I have a question about the loss function proposed in the paper.

If i have understood well you are adding the Lpout term to the loss function in order to optimize the neural network to have uniform label distribution when the input data is outside pin.

So in the end you are penalizing OOD data coming from pout respect from pin. To classify OOD data you can use directly the ACET value?

Is my understanding correct? Thanks

j-cb commented 4 years ago

Hi Eric,

we train the network to have uniform label distribution for input data from p_out, which in the case of ACET consists of adversarially enhanced random images. Simultaneously, the network is trained to still be good on the classification task.

ACET doesn't give us a direct value for an input being OOD, but rather a label distribution for the class labels (like an ordinary classifier does). This label distribution's maximum confidence in any of the classes tends to be lower for many OOD inputs (this already works for an ordinary classifier, but ACET strengthens this effect). Therefore, low maximum confidence is indicative for and helps detecting OOD data.

eric-tc commented 4 years ago

Ok clear.

There are some explanation why use uniform distribution? Or you can use any kind of distribution if it respect that the points not belong to pin?

Thanks

j-cb commented 4 years ago

The idea is that the uniform label distribution has the lowest possible maximum confidence of 1/#classes . Also, it represents complete uncertainty over the class, which we want the network to express for inputs that do not belong to any class.

Minimizing the maximum confidence directly lead to very similar experimental results as minimizing the cross-entropy with the uniform label distribution. In both cases, uniform output has the minimimal possible loss. With our code, you can choose between the two methods by setting max_conf_flag.

eric-tc commented 4 years ago

Ok clear. Thanks