Closed mountains-high closed 2 years ago
Hi @mountains-high, diff1-4 measure the pixel-wise gradient in four directions. So we reduce these gradients to a scalar value for training.
Good day ~
Thank you for your reply, I got the point about the diff 1-4, however, didn't understand the taking "mean" of them. I found these lines in the paper Data-free Knowledge Distillation for Object Detection
According to equation 4 which they(the same authors) say that used on [44] isn't there to be 1/N
if we consider the mean?
What do you think about it?
Thank you
Yes as you mentioned, the only difference between mean
and sum
lies in the scaling factor 1/N
. You can adjust the their weight to get the same loss during training. In other word, if you want to use the summed TV loss, you need to lower down the weight of $\mathcal{L}_{TV}$ by N$\times$. However, there is not too much difference from the perspective of training.
Good day~ Thank you for the answer
Hi~ Thank you for this great work.
My question is about the TV loss. Could you give the reason why you took mean while calculating the TV loss? The paper about that work did not mention the 'mean'. Thank you
https://github.com/zju-vipa/CMI/blob/9e79fa9e2328205f26dbdb226878f3a28f3bf4cc/datafree/criterions.py#L48