why use the log function to regularize the scale?

jonbarron / robust_loss_pytorch

A pytorch port of google-research/google-research/robust_loss/

Apache License 2.0

656 stars 88 forks source link

why use the log function to regularize the scale? #25

Open wzn0828 opened 3 years ago

wzn0828 commented 3 years ago

Hi, I have a question about the implementation. In the Distribution().nllfun method, to regularize the scale to decrease, why you use the log function？ I think the l2 or l1 function is common.

https://github.com/jonbarron/robust_loss_pytorch/blob/9831f1db8006105fe7a383312fba0e8bd975e7f6/robust_loss_pytorch/distribution.py#L208

jonbarron commented 3 years ago

Log(scale) shouldn't be thought of as a regularizer, it's the log of the partition function of a probability distribution. Basically, this is not a "design decision", like L2 or L1 weight decay --- it ensures that the PDF implied by the loss function being viewed as a negative log-likelihood sums to 1, and it's the only thing you can minimize here that does that.

wzn0828 commented 3 years ago

Ok, I see, thank you very much. Another question, I see the adaptiveness can be realized through the negative log-likelihood in Equation (16). However, why is it reasonable? I note that you have a qualitative analysis in the first page and Figure 2, but, what's the fundamental theory behind the idea?

jonbarron commented 3 years ago

This is a good idea if 1) you believe in maximum likelihood estimation (https://en.wikipedia.org/wiki/Maximum_likelihood_estimation) and 2) if you want to maximize the likelihood of the observed data you're training on.

wzn0828 commented 3 years ago

WANDERFUL! Thank you very much.