ros-perception / vision_msgs

Algorithm-agnostic computer vision message types for ROS.
Apache License 2.0
155 stars 74 forks source link

IDs for detections? #17

Closed LeroyR closed 5 years ago

LeroyR commented 5 years ago

How about adding an ID field to the Classification/Detection Msgs.

While using Detection3D msgs works fine in the visual pipeline we struggle with connecting the 'Objects' (BB+cloud+[hypothesis]) to other parts of the 'world'.

Example would be a node that creates grasping objects or shape primitives from the box or cloud of the Detection result. This need some id/name to reference them -> now we also need to store the mapping from world objects to detection results. ID field helps in this case by allowing them to just share the same id.

Kukanani commented 5 years ago

To clarify, are there reasons you can't do any of the following: a) store the full Detection message b) store the object id c) use the timestamp (or hash of the timestamp) as a unique ID for each Detection d) use the seq variable (http://docs.ros.org/api/std_msgs/html/msg/Header.html) as a unique ID for each Detection

LeroyR commented 5 years ago

"Just convenience." It helps to have ids generated in the first pipeline step.

a) store the full Detection message

Already stored for further processing. In the example different components process parts of the Detection. Ofc one component can just assign ids but then we have to share them to other components also processing the detection.

b) store the object id

Objects may have multiple hypotheses

c) use the timestamp (or hash of the timestamp) as a unique ID for each Detection

May generate an array of Detections from one source, all have the same timestamp

d) use the seq variable (http://docs.ros.org/api/std_msgs/html/msg/Header.html) as a unique ID for each Detection

Could do that but this is not really explicit.

Maybe someone else has input?

mintar commented 5 years ago

Since you asked for input, here are my 2 cents.

c) use the timestamp (or hash of the timestamp) as a unique ID for each Detection

May generate an array of Detections from one source, all have the same timestamp

In that case, you should publish a single Detection3DArray message instead of multiple Detection3D messages with the same time stamp. And all processing nodes should copy the time stamp of the original Detection3DArray message (which is the correct way to do it anyway). That way, you can use an ExactTime synchronizer to subscribe both to the original Detection3DArray message and to further processed messages on a different topic (similar to how it's done for Image and CameraInfo messages). If a separate ID was used, each client would have to write an ad hoc solution similar to the ExactTime synchronizer.

If you want to save the client the hassle of using the ExactTime synchronizer, the full Detection3DArray message could be copied into the processed message.

Both approaches are valid, and both don't require a separate ID. Therefore, I'm against adding an ID field for this purpose.

Things would be different if we were talking about tracking, where the same object is tracked over time. In that case, it should get a tracking ID, but I believe it would be better to add a new TrackedObject message instead of modifying the existing ones.

LeroyR commented 5 years ago

Since you asked for input, here are my 2 cents.

c) use the timestamp (or hash of the timestamp) as a unique ID for each Detection

May generate an array of Detections from one source, all have the same timestamp

In that case, you should publish a single Detection3DArray message instead of multiple Detection3D messages with the same time stamp.

Yes, doing that, i said 'array of detections' meant Detection3DArray :smile:

And all processing nodes should copy the time stamp of the original Detection3DArray message (which is the correct way to do it anyway). That way, you can use an ExactTime synchronizer to subscribe both to the original Detection3DArray message and to further processed messages on a different topic (similar to how it's done for Image and CameraInfo messages). If a separate ID was used, each client would have to write an ad hoc solution similar to the ExactTime synchronizer.

If you want to save the client the hassle of using the ExactTime synchronizer, the full Detection3DArray message could be copied into the processed message.

Both approaches are valid, and both don't require a separate ID. Therefore, I'm against adding an ID field for this purpose.

I do not propose an ID field for Time synchronization?

I am talking a graph where multiple node process the same Detection3DArray, e.g. a node that processes the BBs of the detection for construction of a planning scene and another node filtering the hypothesis by speech input or something. Now the node want to start grasping pipeline but for that hat to fetch the ids from the planning scene. An ID field in the first part of the graph (detection) makes sure all nodes talk about the same thing.

Things would be different if we were talking about tracking, where the same object is tracked over time. In that case, it should get a tracking ID, but I believe it would be better to add a new TrackedObject message instead of modifying the existing ones.

In most cases some specialized message makes more sense for your own closed system. Increasing the usage of the std set improves interoperability.

mintar commented 5 years ago

I do not propose an ID field for Time synchronization?

I know. What you want is make sure that a specific detection from a Detection3DArray can be identified. What I'm saying is that the time stamp should identify a unique Detection3DArray, and then the index in the detections[] list identifies a certain detection.

I know this is brittle: the object index will change with each new detection. However, this is because object detection works frame-by-frame. It doesn't have any memory of previous frames, so it cannot assign consistent IDs to the same object over multiple frames. This is why I think you're skipping a step. The object detections should be aggregated and tracked over time by some sort of environment representation, which is the one that assigns IDs to objects. This is what we do on our robots at least.

In most cases some specialized message makes more sense for your own closed system. Increasing the usage of the std set improves interoperability.

Sure, but keeping standard messages lean and focused on one specific task is also important. :)

BTW, ObjectHypothesisWithPose already has an id field. However, in my opinion this should be interpreted as "object class id" (so e.g. "Mug"), not a specific object id ("mug22"). Either the comments (and field name) are a bit misleading, or I'm getting it wrong (but then where is the object class specified?).

LeroyR commented 5 years ago

I do not propose an ID field for Time synchronization?

I know. What you want is make sure that a specific detection from a Detection3DArray can be identified. What I'm saying is that the time stamp should identify a unique Detection3DArray, and then the index in the detections[] list identifies a certain detection.

Yes, most of the time we can use nsec+idx as uuids, it just seems so arbitrary.

I know this is brittle: the object index will change with each new detection. However, this is because object detection works frame-by-frame. It doesn't have any memory of previous frames, so it cannot assign consistent IDs to the same object over multiple frames. This is why I think you're skipping a step. The object detections should be aggregated and tracked over time by some sort of environment representation, which is the one that assigns IDs to objects. This is what we do on our robots at least.

True. If using an environment representation this is already handled somewhere else in the system. Most of the time using specialized messages for world objects, atleast we do :smile:

In most cases some specialized message makes more sense for your own closed system. Increasing the usage of the std set improves interoperability.

Sure, but keeping standard messages lean and focused on one specific task is also important. :)

It already includes a bounding box and a point cloud. The bounding box can be constructed from the cloud - well the cloud is defined as optional so nevermind this.

BTW, ObjectHypothesisWithPose already has an id field. However, in my opinion this should be interpreted as "object class id" (so e.g. "Mug"), not a specific object id ("mug22"). Either the comments (and field name) are a bit misleading, or I'm getting it wrong (but then where is the object class specified?).

We also interpret the field als class id but i agree that this could be clearer.

Kukanani commented 5 years ago

Some good points made here.

We should make the comments on the id field more explicit. This was always meant to be a class id. You can imagine a cube object that could be in 24 different orientations and still look identical, all should have the same class id.

If we were going to add an explicit per-result id, it seems that it should be added at the ObjectHypothesis level or else it will still remain ambiguous in some cases. So perhaps renaming id to object_class or object_class_id and creating a new unique id that is explicitly commented as such. However, your original message is talking about detections and bboxes instead of hypotheses, which have no bbox. So perhaps we would want a unique ID for both detections and hypotheses.

I also don't think that this new id field would be a substitute for performing tracking over multiple frames, since it would change every frame. But it would help nodes that are separate from vision keep other objects synced up with a specific vision message.

Kukanani commented 5 years ago

I clarified the id field in this commit. @LeroyR, does #19 provide a good solution to your problem?

There's a balance to strike here between making the messages usable for multiple use cases, while not making them overly complex to use. I'm doing my best to strike that balance.

LeroyR commented 5 years ago

19 is a solution.

The Moveit stack is using Strings as ids so we would have to do some conversions.

e.x. CollisionObject

Kukanani commented 5 years ago

Should be resolved now with #19. Please re-open if there's still an issue.