yangxue0827 / RotationDetection

This is a tensorflow-based rotation detection benchmark, also called AlphaRotate.
https://rotationdetection.readthedocs.io/
Apache License 2.0
1.08k stars 182 forks source link

Gradient about IOU-Smooth L1 loss in SCRDet #21

Closed igo312 closed 3 years ago

igo312 commented 3 years ago

here's the relative link

In the link I say the backward gradient will be 0 eternally.

In other point, make the |u| underivable that gradient will not be 0. But the gradient of u/|u| is not 1 anymore.

@yangxue0827 Could you please help me out? Many thanks!

yangxue0827 commented 3 years ago

| | means stop gradient operation rather than abs. Why not check the code by yourself, so you can undetstand quickly. image

yangxue0827 commented 3 years ago

For a vector, |u| is a scalar, it only retains the amplitude and no direction.

igo312 commented 3 years ago

Thank you for your response.

But the code is very different from what paper said. And there may be a misunderstanding.I get the IoU should be the lr step when updating the weight while the u/|u| be the direction.

However the gradient is a little weird as I calculate it.

You see, if regard |u| as scaler then the gradient as shown in the picture below, the |u| will effect the lr step. In other word the gradient of part u/|u| actually is not 1 or -1.

gradient (2)

yangxue0827 commented 3 years ago

In any case, my core idea is to let u only determine the back propagation of the gradient, |iou|/|u| to eliminate the discontinuity of loss.

igo312 commented 3 years ago

It is still a little weird because when the |u| is bigger which means the loss become bigger, but once |u| become the denominator, The function of |u| and |IoU| are contradictory.

For exmaple one box is far away from the groundbox, both the value of |F(Iou)|(like |-logIoU| ) and |u| will be bigger.

It's strange for me, what do you think?

yangxue0827 commented 3 years ago

Please allow me to explain in Chinese.

你这种情况是非常少的,因为我们在选取正样本的时候是基于IoU>0.5的样本的。在非边界情况下,|F(Iou)|和|u|的变化趋势不应该本来就是相同的吗?

igo312 commented 3 years ago

能用中文就太好啦。

就是在考虑正常的情况呢(抱歉我没有从边界角度去考虑), 变化趋势一致所以是一件很奇怪的事。

当框相对GT偏移变大的时候,应该希望更新步长要大一些。但是当趋势相同,两者是相矛盾,这就将造成了当偏移变大,步长增长的不快,甚至在某些情况会存在步长减小的情况,也就是这个scale并不稳定。

你觉得呢?

yangxue0827 commented 3 years ago

那你还要考虑u这个量吧,所以用l2损失可能会比较合适,求导的时候会把1/|u|消去只剩下|f(iou)|。我最开始的意图就是在边界下把u的大小给消去,但是按你这么一说用smooth L1可能不太合适,当值比较大的时候smooth l1就是L1了。

igo312 commented 3 years ago

我好像都是从梯度角度去理解了,我需要一点时间去理解你在边界角度考虑设计这个函数的意图。

所以用l2损失可能会比较合适...

但是L2从反传角度上也并不合适,因为会在分母上留下|y-y'|,我觉得直接把|u|去掉就行了,用L1loss,这样反传就只有1或-1,或者可以在反传的程序中引入一个符号函数,这样子无论选用什么loss,回传都是正负1

Testbild commented 2 years ago

@yangxue0827 I do have a question in this regard also:

From my understanding tf.stop_gradient() prevents the gradient from being calculated. Does that not also prevent the weights from being updated?

Best regards and thank you!