mihaidusmanu / d2-net

D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
Other
782 stars 164 forks source link

How to understand the loss function??? #40

Closed lovekittynine closed 4 years ago

lovekittynine commented 4 years ago

Hi @mihaidusmanu ,I have read your great paper twice,but i cant figure out how the loss function works. In my opinion, when a correspondence c is not repeatable, then the score Sc(1)*Sc(2)=0.If in extreme cases, all corresponding points are not repeated, then the loss of the entire image pair is close to zero. If all the corresponding points can be repeated well, the weight between each corresponding point is 1 / C. Another question, off-the-shelf means that the model doesnt train any more? and directly use the Pretrained weight, and can works well? Thank you very much!!!

mihaidusmanu commented 4 years ago

Hello. Regarding the loss function, it should be seen as a weighted average of margin losses (i.e. if the margin loss is lower than the margin, then the detection score is increased and the opposite otherwise). The following affirmation is incorrect: "If in extreme cases, all corresponding points are not repeated, then the loss of the entire image pair is close to zero." since the scores are re-normalized to sum to 1 and all points that are not matchable will have bad margin loss (a weighted average of some high margin losses gives a high loss value).

Off-the-shelf means ImageNet pretrained and yes, pretrained weights already give good enough results.

lovekittynine commented 4 years ago

thanks very @mihaidusmanu much , if all points are not matched, suppose Sc(1)*Sc(2) approches a small value (0.01), because of the sum normalization, we can see the weight approximately approches 1/C which is the same as when all point correctly matched...

mihaidusmanu commented 4 years ago

That is true. However in most cases, the number of points correctly matched is significantly smaller than all points. Thus the weights of correctly matched points will be increased (e.g. > 1 / number_of_pixels) and that of incorrectly matched ones decreased (e.g. < 1 / number_of_pixels).