kunzhan / DSSN

ACM MM 2023: Improving semi-supervised semantic segmentation with dual-level Siamese structure network
http://arxiv.org/abs/2307.13938
17 stars 1 forks source link

Question about the loss #6

Closed Hugo-cell111 closed 8 months ago

Hugo-cell111 commented 8 months ago

Hi! In this line of code, MSE Loss is calculated between pred_u_s1_norm and pred_u_s2_norm. However, since two cutmix boxes are different, img_u_s1 and img_u_s2 are definitely different. So how does it make sense to calculate the MSE Loss between two kinds of predictions? Thanks!

kunzhan commented 8 months ago

Hi! Thanks for your question. In our implementation, the MSE Loss between pred_u_s1_norm and pred_u_s2_norm serves multiple purposes:

  1. Supervision through Co-training: When one view's bounding box is cropped by the other image, the corresponding pixels in the other view remain unchanged. This allows the predictions of the second view to effectively supervise the firstview in this bbox pixels, contributing to a co-training strategy.

  2. Consistency Across Other Areas: The remaining regions of the two bounding boxes, where cutmix has occurred, are subject to the consistency MSE loss. This ensures that the predictions across the rest of the bounding boxes are consistent.

  3. CutMix Augmentation: CutMix itself is a powerful augmentation technique. The differences in predictions between the two views can be seen as a form of regularization induced by the cutmix operation, enhancing the robustness of the model.

In summary, the combination of co-training, consistency loss, and the benefits of the cutmix augmentation collectively contribute to the overall performance and generalization of the model. If you have any more questions or need further clarification, feel free to ask!

kunzhan commented 8 months ago
  1. Contrastive Learning: Importantly, this approach aligns with the theory of contrastive learning without the use of explicit negative pairs. The dissimilarity between the predictions captures the essence of contrastive learning principles, contributing to the model's ability to learn robust representations.
Hugo-cell111 commented 8 months ago
demo

Thanks for your explanation! But what I actually mean is that img_u_s1 and img_u_s2 (two pictures at the bottom row) are different. Although they are combined between two identical pictures, two cutmix boxes are different (with different locations), the cutmixed pictures are not the same actually. So I wonder how does it make sense to directly calculating the MSE Loss between two different cutmixed images. Thanks!

kunzhan commented 8 months ago
  1. I want to highlight a specific aspect of the cutmix operation. In the bounding box of image A, the pixels are replaced by pixels from image C in the same bounding box, while in image B, the pixels are replaced by pixels from image D within the corresponding bounding box.

  2. The cutmix operation involves replacing a portion of one image with another, and the bounding box is constrained to be less than 1/4 of the image size generally.

Hugo-cell111 commented 8 months ago

But in the code implementation of UniMatch and DSSN, bounding box is generated randomly (see this line of code), so two bounding boxes are not corresponding to each other. No matter how small is the bounding box, since the MSE Loss is calculated in an element-wise (pixel-wise) manner, I think directly applying MSE will introduce mistake.

kunzhan commented 8 months ago

I appreciate your attention to the random generation of bounding boxes in the code. Despite the random initialization, the co-training strategy and the pixel-wise contrastive learning employed in the framework help the model adapt and learn effectively over epochs. Additionally, the augmentation techniques, including CutMix, play a crucial role in enhancing the robustness of the model by introducing controlled variations. Over time, these mechanisms contribute to the model's ability to capture meaningful representations.

Hugo-cell111 commented 8 months ago

Yes, I certainly know the contribution of the mse loss. However, as https://github.com/LiheYoung/UniMatch/issues/51 says, since img_u_w and img_u_w_mix are exactly corresponding to the same image, directly calculating the mse loss makes sense. But if two images are not the same, mse loss could not directly applied.