Is there no problem with classifying pixel by conv2d?

open-mmlab / mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

https://mmsegmentation.readthedocs.io/en/main/

Apache License 2.0

8.19k stars 2.6k forks source link

Is there no problem with classifying pixel by conv2d? #641

Closed jinseok-karl closed 3 years ago

jinseok-karl commented 3 years ago

Hi, nowadays I spend many times to exploring this attractive repo. When I try this, I have some question.

In my short knowledge, I've thought last classify prediction layer should be composed by non-linear function like sigmoid, softmax, etc, which activate values before going to loss function in train. Actually almost model seems composed like that. But mmseg, for prediction, may classify pixels by conv2d (decoderhead, seg_cls). Is there no problem?

And, I'm not sure which value should use for binary segmentation, background or not, 2 or 1? Because either experiment has result too bad.

Thanks for release and dedicate to this code.

jinseok-karl commented 3 years ago

For the first question, It may no problem, It might for adjust class number!

MengzhangLI commented 3 years ago

Hi, thanks for your nice word on MMSegmentation.

First, the non-linear function is adopted in def cls_seg(self, feat) in point_head.py. From decode_head.py, the procedure of loss calculation: seg_logits = self.forward(inputs) losses = self.losses(seg_logits, gt_semantic_seg),

where seg_logits is from output of certain network, which is always underwent the function like self.cls_seg(output). It is a conv1d operator with dropout in many cases and it is certain non-linear operator.

Second, I suggest you follow dataset like CHASE_DB1, STARE, which is a binary segmentation, a.k.a, it only has one type foreground. Default backgrond=0 and foreground=1.

Best,

jinseok-karl commented 3 years ago

Thanks to reply! Then would I ask one more? By setting num_class=2 in config, these model outputs 2 channel prediction (before argmax) But I thinks there is no necessary to outputs 2 channel(the channel number comes from num_class) . And it taken by softmax and chosen largest value by argmax. But I thinks Just one channel is enough as sigmoid with threshold. Is there any my misunderstanding?

MengzhangLI commented 3 years ago

Thanks to reply! Then would I ask one more? By setting num_class=2 in config, these model outputs 2 channel prediction (before argmax) But I thinks there is no necessary to outputs 2 channel(the channel number comes from num_class) . And it taken by softmax and chosen largest value by argmax. But I thinks Just one channel is enough as sigmoid with threshold. Is there any my misunderstanding?

Because the background is one type, 2 channels means one is background and the other is foreground.

the channel number = num_class + 1 (background)

Best,