halbielee / EPS

Official PyTorch implementation of "Railroad is not a Train: Saliency as Pseudo-pixel Supervision for Weakly Supervised Semantic Segmentation", CVPR2021
86 stars 13 forks source link

What is the novelty of your paper? #2

Closed TyroneLi closed 3 years ago

TyroneLi commented 3 years ago

I can not understand why this paper been accepted? Adopting saliency map whether in supervised or unsupervised manner as additional supervision is a cheat.

ghost commented 3 years ago

I also have the same question.

bityangke commented 3 years ago

Plenty of papers used saliency as supervision. You shall carefully read these papers to find why. I think your words are very impolite.

ghost commented 3 years ago

@bityangke They only use saliency map as the background cues which cannot be involved in supervised training. Please provide some papers you mentioned. Thanks.

bityangke commented 3 years ago

Why cannot use sal as supervison? If we have computed the saliency, why not use as supervision? Whether it's used as background cues or supervision, it's the same. Both use the computed saliency map in the tuning of the model

TyroneLi commented 3 years ago

@bityangke They only use saliency map as the background cues which cannot be involved in supervised training. Please provide some papers you mentioned. Thanks.

I cannot agree more with you! By the way, the usage of saliency map as supervision is not elegant, because I cannot see any interesting insights of this.

halbielee commented 3 years ago

Thank you for your interest of our work! I found that there was controversy until the code was released. Now the code is open :)

Here I leave my opinion.

@TyroneLi @stickyfiner

I agree that using a saliency map for the weakly supervised semantic segmentation (WSSS) can feel unfair. However, as @bityangke said, a lot of works in WSSS use the saliency map for more accurate pseudo-mask. Although most of them use the saliency map as a background cue for the pseudo-mask, the fact that they use the saliency map does not change.

Then, you may think of our method as just a way to make good use of the saliency map.

However, we identify the three challenges of WSSS (sparse object coverage, inaccurate object boundaries, and co-occurrence problem) which existing works could not solve them all at once in the paper. Even the methods that use the saliency map in training phase do not solve these problems.

For this, we focused on the complementary relationship between the localization map (CAM) and the saliency map (The localization map can distinguish different objects but does not separate their boundaries and the saliency map provides rich boundary information but does not reveal object identity) and devised a way to utilize the both information.

We show that our method is effective for alleviating the three challenges with extensive experiments. Additionally, our method does not simply improve the performance only up to the quality of the saliency map. We found that our method can synergize the localization map and the saliency map - we observe that noisy and missing information of each other is complemented via our joint training strategy.

So, we think that our method is more than just using the saliency map in the training process.

@bityangke Thank you for the sound discussion and your thoughtful comments.

Please see our paper for more detail! We provide supplementary for more experiments as well. Paper link

TyroneLi commented 3 years ago

Thank you for your interest of our work! I found that there was controversy until the code was released. Now the code is open :)

Here I leave my opinion.

@TyroneLi @stickyfiner

I agree that using a saliency map for the weakly supervised semantic segmentation (WSSS) can feel unfair. However, as @bityangke said, a lot of works in WSSS use the saliency map for more accurate pseudo-mask. Although most of them use the saliency map as a background cue for the pseudo-mask, the fact that they use the saliency map does not change.

Then, you may think of our method as just a way to make good use of the saliency map.

However, we identify the three challenges of WSSS (sparse object coverage, inaccurate object boundaries, and co-occurrence problem) which existing works could not solve them all at once in the paper. Even the methods that use the saliency map in training phase do not solve these problems.

For this, we focused on the complementary relationship between the localization map (CAM) and the saliency map (The localization map can distinguish different objects but does not separate their boundaries and the saliency map provides rich boundary information but does not reveal object identity) and devised a way to utilize the both information.

We show that our method is effective for alleviating the three challenges with extensive experiments. Additionally, our method does not simply improve the performance only up to the quality of the saliency map. We found that our method can synergize the localization map and the saliency map - we observe that noisy and missing information of each other is complemented via our joint training strategy.

So, we think that our method is more than just using the saliency map in the training process.

@bityangke Thank you for the sound discussion and your thoughtful comments.

Please see our paper for more detail! We provide supplementary for more experiments as well. Paper link

Actually, I still hold my own opinion as most people. I agree most wsss paper adopt saliency map to estimate background cues, however, they only use this as final postprocessing which shares the similar method as CRF. If someone uses saliency map to supervise training, that will inevitably provide 'explicit full mask labeling' to the network. They could adopt any SOTA saliency method to obtain accurate saliency maps for voc benchmark. So could you tell us what's the difference between human-labeling voc and saliency maps mask. Can you list any other works that use saliency map as training supervision?? The issues (sparse object coverage, inaccurate object boundaries, and co-occurrence problem) existing at wsss, you could leverage other strategies to alleviate, but cannot introduce saliency maps as supervision. I think this is like mixing validation set to training set, your reported results are truly not fair. The first and the most important starting point of yours is not convincing.

halbielee commented 3 years ago

Dear @TyroneLi

  1. As you said if someone used a saliency map with better performance when training the network, the network could predict or generate better localization maps (CAM). But this is the same when using the saliency map as a background cue. Actually, papers in WSSS which used the saliency map as a background cue do not adopt the same saliency map and there are enough performance gaps between the saliency detectors. We concerned this point and conducted on the saliency detectors used in OAA.

  2. Saliency map can be explicit supervision for the segmentation task, but the saliency map is not perfect as a ground-truth for the segmentation task. We call this kind of supervision "weak supervision". The saliency map does not have any distinction in classes and only has foreground and background. In addition, the foreground does not coincide with the target classes in the dataset. Finally, the saliency map is noisy itself, so it is not appropriate to directly use it.

  3. Joint learning of saliency detection and weakly supervised semantic segmentation, and Saliency guided self-attention network for weakly and semi-supervised semantic segmentation use saliency map as training supervision.

Since the saliency map is stronger supervision than an image-level label, you might be unpleasant when using the saliency map for the weakly supervised semantic segmentation task using the image-level label. However, we just figured out that the saliency map could be used as additional supervision for WSSS and our method could resolve the three problems simultaneously. I think this is a tiny step for better research. Thant's all. Next time, one of us or other researchers might solve the problems without using stronger supervision such as saliency map. This can be another step.

I will take your opinions and advice carefully and I hope I do better research that convinces more researchers.

- Seungho Lee

Thank your opinions, @bityangke @stickyfiner, as well.