zju-vipa / CMI

[IJCAI-2021] Contrastive Model Inversion for Data-Free Knowledge Distillation
68 stars 17 forks source link

About TV loss #8

Closed mountains-high closed 2 years ago

mountains-high commented 2 years ago

Hi~ Thank you for this great work.

My question is about the TV loss. Could you give the reason why you took mean while calculating the TV loss? The paper about that work did not mention the 'mean'. Thank you

https://github.com/zju-vipa/CMI/blob/9e79fa9e2328205f26dbdb226878f3a28f3bf4cc/datafree/criterions.py#L48

VainF commented 2 years ago

Hi @mountains-high, diff1-4 measure the pixel-wise gradient in four directions. So we reduce these gradients to a scalar value for training.

mountains-high commented 2 years ago

Good day ~

Thank you for your reply, I got the point about the diff 1-4, however, didn't understand the taking "mean" of them. I found these lines in the paper Data-free Knowledge Distillation for Object Detection

image

According to equation 4 which they(the same authors) say that used on [44] isn't there to be 1/N if we consider the mean? What do you think about it?

Thank you

VainF commented 2 years ago

Yes as you mentioned, the only difference between mean and sum lies in the scaling factor 1/N. You can adjust the their weight to get the same loss during training. In other word, if you want to use the summed TV loss, you need to lower down the weight of $\mathcal{L}_{TV}$ by N$\times$. However, there is not too much difference from the perspective of training.

mountains-high commented 2 years ago

Good day~ Thank you for the answer