zju3dv / LoFTR

Code for "LoFTR: Detector-Free Local Feature Matching with Transformers", CVPR 2021, T-PAMI 2022
https://zju3dv.github.io/loftr/
Apache License 2.0
2.27k stars 357 forks source link

Some quentions for focal loss #53

Closed superluckyxhh closed 2 years ago

superluckyxhh commented 3 years ago

Hi. Why pos-conf and neg-conf all use 0.25 (1-xxxconf)^2 log(xxxconf)? neg-conf means a bad match or one that doesn't need to be matched.Considering this reason. As neg-conf go down, the neg-loss as do. But in the code,loss_neg = - alpha * torch.pow(1 - neg_conf, gamma) * neg_conf.log(), neg-conf decrease, the neg-loss will increase instaead? Looking forward to your reply

angshine commented 3 years ago

Hi, I think you are referring to the sparse supervision with a sinkhorn matching layer. For the sinkhorn case, a negative sample (an unmatchable grid) should lead to high confidence or score in the dustbin entry, thus it is actually a positive sample for the dustbin entry. The "positive" and "negative" samples here are different from ones defined in the RetinaNet. They are just positive samples corresponding to different matching targets. Moreover, you might find the alpha term confusing too since both "positive" & "negative" samples use alpha without one of them assigned 1 - alpha weight for neg-pos balancing. We actually use POS_WEIGHT & NEG_WEIGHT in the config for neg-pos balancing, which makes the alpha term meaningless. The naming could be a little bit confusing. I would try to refactor that part in the future😀.

superluckyxhh commented 3 years ago

Thanks. I still don't fully understand. As I understand the dustbin scores(S_sudtbin). If a S_sudtbin has a high score, that menas unmatched couple predict a high score. But we need unmatched coupe has a low score. So the Loftr should decrease the S_dustbin. In other words, S_dusstbin Is inversely proportional to Loss. Can you point out what's wrong with my understanding. Thank you

angshine commented 3 years ago

For the sparse supervision with a sinkhorn matching layer, you could approximately think of it as a multi-class classification problem, like a softmax-focal-loss instead of sigmoid-focal-loss. For each coarse-level grid in the left view to be matched, there is one "target" in the right view that corresponds to it. This "target" could be a grid in the right view or just a dustbin (unmatchable). We extract those entries from the score matrix and apply focal-loss to them, whose scores/probabilities should be high instead of low, as you said. Maybe just forget about the positive & negative terms; we only supervise the positive terms in the sparse supervision case. The sinkhorn layer could naturally suppress the negative terms (which should have low scores).