how to use temperature scaling parameter

gpleiss / temperature_scaling

A simple way to calibrate your neural network.

MIT License

1.09k stars 159 forks source link

how to use temperature scaling parameter #17

Closed SophieChang66 closed 4 years ago

SophieChang66 commented 4 years ago

@gpleiss hello , thank you for sharing your code! I had trained my binary classification network with softmax cross entropy loss. And I compute the prob using softmax. Now I delete softmax layer, so the last layer of my network is a fully connected layer(logits). I calibrated the confidence with your sharing code, and got the temperatuer scaling. How should I utilize the scale ? 1. sigmoid(logits / T) or 2.softmax(logits / T) ? As I known, the paper said the sigmoid(logits / T).

SophieChang66 commented 4 years ago

the code readme says using softmax(logits / T), and the code compute the nll loss with nll_loss(log_softmax)

gpleiss commented 4 years ago

sigmoid(logits / T). The network is trained without the temperature.

SophieChang66 commented 4 years ago

@gpleiss however this is one sentence in the readme.md of your repo

Temperature scaling divides the logits (inputs to the softmax function) by a learned scalar parameter. I.e.

softmax = e^(z/T) / sum_i e^(z_i/T)

and the code computes nll_loss based on log_softmax. Does this mean that the way using the learned scale depends on the way of computing prob of original network ?

gpleiss commented 4 years ago

and the code computes nll_loss based on log_softmax.

Which part of the code are you referring to? The training code does not apply temperature scaling, since temperature scaling is only done as part of post processing.

SophieChang66 commented 4 years ago

@gpleiss In the file temperature_scaling.py set_temperature function in which to tune temperature scale the code optimizes it with nll loss that actually is crossEntropy loss(which is negtivie log_softmax)

gpleiss commented 4 years ago

This is the methodology that we describe in our paper - see section 4.1, Platt Scaling. The temperature is optimized to minimize the NLL on the validation set. This is the natural choice, since we desire a probabilistic output and NLL will ensure that our output is probabilistic.

Does this mean that the way using the learned scale depends on the way of computing prob of original network ?

Are you asking if the original network also has to be trained with NLL? I'm not sure if our method would apply otherwise. If you were learning the network with a different loss (e.g. hinge) then your model will likely not be probabilistic.

SophieChang66 commented 4 years ago

THX for your reply! My original network is the logits layer following softmax layer in the deploy .My original network was also trained with NLL. After I got the temperatuer on validation set, how should I use it in the post processing ?

SophieChang66 commented 4 years ago

@gpleiss https://geoffpleiss.com/nn_calibration, here I got the answer!