ros-perception / vision_msgs

Algorithm-agnostic computer vision message types for ROS.
Apache License 2.0
155 stars 74 forks source link

[WIP] simpler BoundingBox2D #5

Closed mintar closed 7 years ago

mintar commented 7 years ago

This PR is to continue the discussion from #2 about bounding boxes.

I would strongly prefer to use something like a sensor_msgs/RegionOfInterest for the 2D bounding box:

This would be much easier to implement correctly than a rotated bounding box with floats. With that approach, we would have to specify things like "use the Bresenham algorithm to figure out which pixels are inside the rotated box". I'm afraid since most people don't use the rotation, they'd just ignore it or implement it incorrectly.

Unfortunately, sensor_msgs/RegionOfInterest has a weird do_rectify field which doesn't make sense here. If that's a show stopper, we should create our own copy of RegionOfInterest without that field, otherwise we can use RegionOfInterest directly.


If we do use the current BoundingBox2D, it should use float everywhere; as agreed upon in #2, it doesn't make sense to specify the center in float and the size in uint32.

If that message moves to geometry_msgs, it shouldn't be in pixel coordinates, but in meters. All the other measures of length in that package are in meters, and assume that some people would want to use geometry_msgs/BoundingBox2D for stuff like specifying a rectangular region in a map (where meters is the obvious unit). But meters would be useless for us. All this leads me to believe that a BoundingBox2D that would live happily in geometry_msgs is different from what we need.

Note that this doesn't apply to BoundingBox3D: It's already in meters, so it could go to geometry_msgs.

Kukanani commented 7 years ago

Thanks for the PR. I agree that if we use a Pose2D for the center, we should specify the size in a float.

I can think of a few reasons to keep using BoundingBox2D instead of RegionOfInterest, however. Some object detectors, such as YOLO, return floats for their bounding boxes (example). You could argue that this is information that doesn't need to be sent over a ROS message in most cases, but it does represent a loss of information.

If we allow Detection3D to have a rotated bounding box, I think Detection2D should do the same for consistency. Some pipelines (such as cnn grasp detection, while not detectors in the strictest sense, predict rotated 2D bounding boxes.

One more point is that the sensor_msgs/RegionOfInterest.do_rectify variable has no meaning for our use case, AFAICT.

mintar commented 7 years ago

I can think of a few reasons to keep using BoundingBox2D instead of RegionOfInterest, however. Some object detectors, such as YOLO, return floats for their bounding boxes (example). You could argue that this is information that doesn't need to be sent over a ROS message in most cases, but it does represent a loss of information.

In that case we should make it clear in the comments of the .msg file how to interpret this exactly. If I have a bounding box of size 10.123 x 20.456 with a 23° rotation, which pixels are included in this bounding box?

If we allow Detection3D to have a rotated bounding box, I think Detection2D should do the same for consistency. Some pipelines (such as cnn grasp detection, while not detectors in the strictest sense, predict rotated 2D bounding boxes.

Good point; I agree it would be good for the message to be general enough to capture that use case.

That said, I don't think it holds that BoundingBox3D and 2D should be treated equally; they are fundamentally different. The reference frame of BoundingBox3D is arbitrary (such as the starting pose of the robot), and removing the orientation would force each bounding box to be axis-aligned with that frame. OTOH, the 2D bounding box doesn't talk about coordinates in the world wrt. some reference frame, but specifies a set of pixels in an image.

One more point is that the sensor_msgs/RegionOfInterest.do_rectify variable has no meaning for our use case, AFAICT.

Yes, I mentioned that. So we shouldn't use RegionOfInterest, but copy + adjust it if we want to use it.


To further complicate matters, here's one more way we could specify the bounding box: as (x, y) coordinates of the four corners. That would allow not only rotation, but also allow skew (for example, if you've matched a reference texture to a skewed object in the image). This service does that:

The location is given in the form of (x, y) coordinates of four bounding box corners. Please note that these points represent particular corners of the reference object (upper-left, upper-right, lower-right, lower-left), which enables finding out the actual orientation of the object in the query picture.

I don't have an opinion at the moment if it's a good idea, I just thought I'd bring up the possibility.

Kukanani commented 7 years ago

Great points! If we allow specifying the 4 corners, then the bounding box is either 1) imbued with some 3D information, or 2) not actually a rectilinear box. I think a generic polygon would be better to use in either case.

I think using a Pose2D and float sizes are the best balance between ease of use and expressive power. We can definitely advocate for the Bresenham algo for when pixel-perfect cropping is required. This can be added in this commit or in a later documentation improving commit. Any other thoughts before I merge?

Kukanani commented 7 years ago

I also want to point out that there's not much consensus on whether or not to specify the upper-left corner of a bounding box or the center.

Upper left:

Center:

So I guess we should simply choose one.

Kukanani commented 7 years ago

After thinking it over and comparing the use cases, I've decided we should keep the rotated BoundingBox2D available in this package, but make the default message types support only a corner/size box without rotation, which I'll call BoundingRect. I'm going to merge this PR to confirm the other changes, then create the new BoundingRect in a new PR.