lorenmt / reco

The implementation of "Bootstrapping Semantic Segmentation with Regional Contrast" [ICLR 2022].
https://shikun.io/projects/regional-contrast
Other
162 stars 25 forks source link

understanding the reco loss #7

Closed seyeeet closed 2 years ago

seyeeet commented 2 years ago

I am kinda new to contrast learning and although I completely follow the math, I have a confusion when I am looking at the code (by code I mean the general code, not only your code :) ).

So in the paper Eq.1 is the reco loss, which is the pixel wise contrastive loss, but in the code here I cannot understand why it is computed different from the way that Eq1 is written, is it a common thing in practice?

In more details, in the code, we have all_feat, which is 256x513x256, so for each sample the first one is the positive and the rest are the negatives. The seg_logits then will have 256x513 dimension. We then compute the similarity via cosine_similarity (which I think is different from the contrastive loss in eq 1).

Then, in the next step based on the F.cross_entropy we want all of them to have label zeros, because they have to be the same as the positive one. My confusion is that why we dont use Eq1 as it is written in the paper and why we use F.cross_entropy. I feel this is not exactly the same Eq1. Can you please help me understand the relation?

lorenmt commented 2 years ago

First of all, the equation and code are consistent.

In more details, in the code, we have all_feat, which is 256x513x256, so for each sample the first one is the positive and the rest are the negatives. The seg_logits then will have 256x513 dimension. We then compute the similarity via cosine_similarity (which I think is different from the contrastive loss in eq 1).

Up to here, your understanding is correct.

Cosine similarity is the same as normalised dot product, it is a way to compute the distance from two vectors. Cosine similarity is typically used more often because it has a bounded distance from -1 to 1 (completely different direction, to the same direction); this is correspond to the formulation: r_q \cdot c_k / \tau.

Cross entropy is then the same as the negative log-likehood loss: cross_entropy(pred, gt) = - gt * log(pred). Here pred and gt are both probability distribution, which we used softmax to achieve, the F.cross_entropy has softmax operation built in the function, so that how -log exp(..) / exp(..) + exp(..) comes from. I would suggest to further dive into the documentation of pytorch for a better understanding.

seyeeet commented 2 years ago

Thank you very much for the explanation, can you please tell me what part of the code corresponds to the numerator of equation 1 and what part of the code correspond to the denominator of the equation 1?

lorenmt commented 2 years ago

That is the softmax embedded in the cross_entropy function: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

seyeeet commented 2 years ago

oh I see, that makes things very clear, thank you!