gpleiss / temperature_scaling

A simple way to calibrate your neural network.
MIT License
1.09k stars 159 forks source link

Temperature scaling on segmentation tasks #23

Open Karol-G opened 3 years ago

Karol-G commented 3 years ago

Hi,

I wanted to apply temperature scaling to a segmentation task with 1 class (So a pixel belongs to this class or not). Instead of CrossEntropyLoss I am using BCEWithLogitsLoss and I had to disable _ECELoss due to some bugs I could not fix. However, the actual evaluation score performance is much worse after temperature scaling then before. Furthermore, I noticed that the temperature during optimization becomes very high. What are your thoughts on this? It does not seems correct that the temperature becomes this high? Is temperature scaling simply not suited for segmentation? Did you do tests on some segmentation tasks?

-----
Skipped about 10 steps here
-----
self.temperature:  Parameter containing:
tensor([3.4501], requires_grad=True)
self.temperature:  Parameter containing:
tensor([7.5331], requires_grad=True)
self.temperature:  Parameter containing:
tensor([15.8106], requires_grad=True)
self.temperature:  Parameter containing:
tensor([49.7303], requires_grad=True)
self.temperature:  Parameter containing:
tensor([975.0599], requires_grad=True)
self.temperature:  Parameter containing:
tensor([8138381.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([16275787.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([24413192.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([32550598.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([40688004.], requires_grad=True)
self.temperature:  Parameter containing:
tensor([40688004.], requires_grad=True)
Optimal temperature: 40688004.000
After temperature - NLL: 0.693

Best Karol

hmeine commented 3 years ago

We use if for segmentation, with the CrossEntropyLoss. However, I have problems with the L-BFGS myself – I believe it is because the loss is locally linear around the starting temperature; the loss is evaluated many times, although only one optimizer step is performed, and it fails to go far from the starting point.