Why does the code split segmentation output with num_parts = 133? (about background processing)

sha2nkt / deco

Estimate vertex-level 3D human-scene and human-object contacts across the full body mesh

https://deco.is.tue.mpg.de/

Other

58 stars 4 forks source link

Why does the code split segmentation output with num_parts = 133? (about background processing) #12

Closed dqj5182 closed 8 months ago

dqj5182 commented 8 months ago

Sorry for too many questions regarding processing segmentation. I just have a one last question.

It seems that the segmentation outputs 133 classes (excluding background). If I include background, then it seems to be more reasonable to split the segmentation output with num_parts = 134. Why did the code split it with num_parts = 133?

Screen Shot 2023-10-17 at 4 56 04 PM

ac5113 commented 8 months ago

The reasoning was that most (if not all) of the background would fall into one of the coco val panoptic classes, since it is afterall a scene segmentation model. Thus, background class has been included in the part segmentation prediction and mask, since we would need to differentiate between person/non-person.

dqj5182 commented 8 months ago

Thanks for the reply.

But, the Mask2Former panoptic segmentation model does output background class (0) which is included in the final panoptic segmentation result. How do you process it? Specifically, what value do you impose the background class to be out of the values of 1 ~ 133?

ac5113 commented 8 months ago

Please check this issue. Mask2Former discards the background class during inference. Thus, all 133 classes in the output are object classes.