Closed 22ema closed 1 year ago
Right. There are two loss terms in DIST, i.e., intra-class and inter-class losses. For inter-class loss, the information is limited with only two categories. For intra-class loss, it could still be effective since it transfers the relations among batch axis.
You could still try DIST on your task, as it is easy to implement. For simple task that is easy to converge, it is recommended to use a larger SoftMax temperature (e.g., T=4 in CIFAR-100), and the performance would be much better than the original T=1.
Thank you for the great response.
Sorry but I have one more question.
could it be used in bbox or landmark regression rather than class classification?
Sorry but I have one more question.
could it be used in bbox or landmark regression rather than class classification?
DIST relax the absolute approximation into relative approximation. However, the regression task requires absolute approximation. So I think it is unsuitable to use (at least only use) DIST in regression task.
Thank you for the great response.
Could I use DIST in RetinaFace?
RetinaFace have only 2class(face, not face). so Pearson's correlation coefficient seems to be inefficient.
In summary, if the class is small, the dist is inefficient. Especially in the case of binary, it looks more inefficient. I wonder if the above opinion is correct.