Gradient about IOU-Smooth L1 loss in SCRDet

yangxue0827 / RotationDetection

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.

https://rotationdetection.readthedocs.io/

Apache License 2.0

1.08k stars 182 forks source link

Gradient about IOU-Smooth L1 loss in SCRDet #21

Closed igo312 closed 3 years ago

igo312 commented 3 years ago

here's the relative link

In the link I say the backward gradient will be 0 eternally.

In other point, make the |u| underivable that gradient will not be 0. But the gradient of u/|u| is not 1 anymore.

@yangxue0827 Could you please help me out? Many thanks!

yangxue0827 commented 3 years ago

| | means stop gradient operation rather than abs. Why not check the code by yourself, so you can undetstand quickly.

yangxue0827 commented 3 years ago

For a vector, |u| is a scalar, it only retains the amplitude and no direction.

igo312 commented 3 years ago

Thank you for your response.

But the code is very different from what paper said. And there may be a misunderstanding.I get the IoU should be the lr step when updating the weight while the u/|u| be the direction.

However the gradient is a little weird as I calculate it.

You see, if regard |u| as scaler then the gradient as shown in the picture below, the |u| will effect the lr step. In other word the gradient of part u/|u| actually is not 1 or -1.

gradient (2)

yangxue0827 commented 3 years ago

In any case, my core idea is to let u only determine the back propagation of the gradient, |iou|/|u| to eliminate the discontinuity of loss.

igo312 commented 3 years ago

It is still a little weird because when the |u| is bigger which means the loss become bigger, but once |u| become the denominator, The function of |u| and |IoU| are contradictory.

It's strange for me, what do you think?

yangxue0827 commented 3 years ago

Please allow me to explain in Chinese.

你这种情况是非常少的，因为我们在选取正样本的时候是基于IoU>0.5的样本的。在非边界情况下，|F(Iou)|和｜u｜的变化趋势不应该本来就是相同的吗？

igo312 commented 3 years ago

能用中文就太好啦。

就是在考虑正常的情况呢(抱歉我没有从边界角度去考虑)，变化趋势一致所以是一件很奇怪的事。

当框相对GT偏移变大的时候，应该希望更新步长要大一些。但是当趋势相同，两者是相矛盾，这就将造成了当偏移变大，步长增长的不快，甚至在某些情况会存在步长减小的情况，也就是这个scale并不稳定。

你觉得呢？

yangxue0827 commented 3 years ago

那你还要考虑u这个量吧，所以用l2损失可能会比较合适，求导的时候会把1/｜u｜消去只剩下｜f(iou)｜。我最开始的意图就是在边界下把u的大小给消去，但是按你这么一说用smooth L1可能不太合适，当值比较大的时候smooth l1就是L1了。

igo312 commented 3 years ago

我好像都是从梯度角度去理解了，我需要一点时间去理解你在边界角度考虑设计这个函数的意图。

所以用l2损失可能会比较合适...

但是L2从反传角度上也并不合适，因为会在分母上留下|y-y'|，我觉得直接把|u|去掉就行了，用L1loss，这样反传就只有1或-1，或者可以在反传的程序中引入一个符号函数，这样子无论选用什么loss，回传都是正负1

Testbild commented 2 years ago

@yangxue0827 I do have a question in this regard also:

From my understanding tf.stop_gradient() prevents the gradient from being calculated. Does that not also prevent the weights from being updated?

Best regards and thank you!