ros-perception / vision_msgs

Algorithm-agnostic computer vision message types for ROS.
Apache License 2.0
149 stars 72 forks source link

Segmentation Message Types? #87

Open lexi-brt opened 1 year ago

lexi-brt commented 1 year ago

Hi!

I'm looking for a std ros type to use for segmentation outputs.

Something like these: https://github.com/DavidFernandezChaves/Detectron2_ros/blob/master/msg/Result.msg https://github.com/akio/mask_rcnn_ros/blob/kinetic-devel/msg/Result.msg

Is there something in this package that's suitable for this already? If not, how would I go about contributing a proposal and getting something merged?

SteveMacenski commented 1 year ago

What are you looking for? These don't appear to me to be pixel-wise segmentation classes, unless you're only looking at the sensor_msgs/Image[] masks and not the sensor_msgs/RegionOfInterest[] boxes

https://github.com/ros-perception/vision_msgs/blob/ros2/vision_msgs/msg/Detection2D.msg does something like the boxes in those messages. I definitely don't disagree a segmentation message would be valuable and in discussion in https://github.com/ros-perception/vision_msgs/issues/63. I think it might be good to start with a proposal and @Kukanani and I can review and we can go from there!

Kukanani commented 1 year ago

Yes, happy to consider any proposals on the segmentation front!

mintar commented 1 year ago

I agree with everything @SteveMacenski said. Personally, I'd go with one of the following approaches:

  1. Either, create a new message type:
    std_msgs/Header header
    vision_msgs/Detection2DArray detections
    sensor_msgs/Image[] masks
  2. Or, publish those things on separate topics, one vision_msgs/Detection2DArray for the detections and one sensor_msgs/Image for each mask.

This also depends what's inside the mask image(s):

If it's a single segmentation image, I'd go with approach (2) above, for the reasons I've outlined in this comment. If it's individual object masks (which is what Mask R-CNN is doing), approach (2) becomes very cumbersome/impossible, so I'd go with approach (1).

SteveMacenski commented 1 year ago

I'm not 100% sure I understand having detection and segmentation masks together - these are often different processes building bounding boxes vs pixel-wise segmentation masks (though I suppose a BB could be generated from a mask rather easily).

LabelInfo is meant to communicate the label's class IDs to string IDs, so that could be reused here just like in detections with synchronized topics.

I agree instance segmentation vs class segmentation adds in a wrench. For class segmentation, 1 image is OK, but for instance segmentation, we may need N images for the N instances or try to find a way to embed that in an imagine in another way. Perhaps a new Image-like message containing the class, instance, and probability info for each pixel so it could work for any situation (and instance = 0 for non-instance segmentation implementations)

gachiemchiep commented 10 months ago

@SteveMacenski @mintar I think we could use the design of PasCalVOC dataset for this problem.

For example, this is JPEGImage :

2007_000129

The mask for class segmentation (semantic segmentation) is like this: 2007_000129

Then for instance segmentation, they add another mask for object like this: 2007_000129

By using this rule, only 2 mask images is needed.

SteveMacenski commented 10 months ago

But how does that distinguish the class of the instance? If you just have instance 1...N for 1...N objects, you'd have multiple 1 blocks representing different first-instances of N classes

I think that mask would need to have 2 values: 1 for the instance # and another for the class #. It doubles the message size which I don't love, but without doing bit shifting, that's I think the best we can do. For non-instance segmentation algorithms, that can be left empty/non-allocated so it shouldn't be a huge amount of overhead relative to the image segmentation message size.

Thoughts @mintar ?

gachiemchiep commented 10 months ago

@SteveMacenski Maybe my writting is a little bit confused.

For sementic segmentation, 1 mask image is needed: 1 mask image for class as showed in 2nd picture.

For instance segmentation 's result, 2 mask images are needed. 1 mask image for class as showed in 2nd picture 1 mask image for object as showed in 3rd picture.

Instance segmentation can also be explained like :

  1. First do the detection to find all object' s box. The box msg is Detection2D.msg
  2. For each box, do the segmentation to find the object mask inside

    instead of publishing entire image mask as above approach. We could cut off the mask for each box. then attach the mask image to each box's msg.