tientrandinh / Revisiting-Reverse-Distillation

(CVPR 2023) Revisiting Reverse Distillation for Anomaly Detection
MIT License
116 stars 25 forks source link

When designing the loss_contrast,do we need to consider noise mask location information? #5

Closed hitlei closed 11 months ago

hitlei commented 1 year ago

loss_contrast = self.contrast(noised_feature1.view(noised_feature1.shape[0], -1), normal_proj1.view(normal_proj1.shape[0], -1), target = target) + self.contrast(noised_feature2.view(noised_feature2.shape[0], -1), normal_proj2.view(normal_proj2.shape[0], -1), target = target) + self.contrast(noised_feature3.view(noised_feature3.shape[0], -1), normal_proj3.view(normal_proj3.shape[0], -1), target = target)

In this code, when using CosineEmbeddingLoss to push away normal and abnormal features, you did not use noise mask information. I`m afraid this may causes the parts that are not noised to be pulled away, but in fact, they're all normal features.

tientrandinh commented 1 year ago

Thank you for your question. We leverage global information in the feature space when designing the contrast loss. If the encoder receives abnormal (noised) images as input, we treat the resulting feature outputs as abnormal features. Due to the non-linear relationship between the anomalous regions in the images and their corresponding representations in the feature space, pinpointing the exact coordinates of abnormality at the feature-level becomes challenging. For example, consider an image with a size of 256x256, to which we add a small noise of size 6x6. Meanwhile, the feature output of block 3 in Wideresnet50 has a shape of 16x16. Consequently, deciding which regions in the feature space are abnormal or normal becomes a non-trivial task. I understand your perspective, and it is worth considering, but I believe that global feature compactness is fine.

tdh512194 commented 1 year ago

@hitlei we are treating individual sample as an instance to be pushed away if it is abnormal by flattenning the feature maps for the cosine embdding loss. An alternative to flattenning could be using global average pooling. I think you made an interesting point where the constrative loss can be performed locally (pixel-wise) by leveraging the noise mask. However, it gets complicated as we are comparing latent features, not the original input. In the latent feature maps, each local pixel's receptive field expands as we go deeper, making the respective location of the noise mask boundary expands and eventually got covered by the whole spatial coordinate. Hence, we are choosing a more simple way by contrasting the global features.