How do you think about the derivative of angular loss on convergence point?

Hi.

I'm deeply thinking about how your angular loss can be converged on the point of theta nearby to zero.

I have a anxiety of the fact that the derivative of inverse cosine will be infinity on the convergence point as follows:

A network has been generally trained by updating weight with the help of back-propagation.

Back-propagation method was designed as the chain rule of the derivative of each components.

Should we have confirmed that this network have been converged with the angular loss?

I have expected that the angular loss maybe make it to oscillate the loss function on that point.

Can you give me your opinion for my suggestions?

Hi, this is a very good question. Here are what I think:

First, to avoid the gradient explosion near x=1 or -1, we clip the inner product of the estimated illuminant and the GT illuminant, as in Line230 in 'multidomain_model.py'. This can constrain the gradient from going to -Inf.

Second, when the task is difficult (some corner cases in GehlerShi, NUS datasets), it's hard for the CNN to get to the convergence point. Meanwhile, better convergence on the training set can not ensure improved performance on testing set.

I partially agree with you that angular loss is not stable for convergence. In this paper, we followed previous papers to use the angular loss for training. I have tried the usage of L2 loss. If I don't remember wrong, the angular loss worked as fine as L2 loss. I think maybe you can try using L2 loss and some other loss functions.

msxiaojin / MDLCC

How do you think about the derivative of angular loss on convergence point? #4