ros-perception / vision_msgs

Algorithm-agnostic computer vision message types for ROS.
Apache License 2.0
155 stars 74 forks source link

add unique object ID for tracking #18

Closed hakuturu583 closed 5 years ago

hakuturu583 commented 5 years ago

Hi, I want to make object detection (https://github.com/OUXT-Polaris/nnabla_vision_detection) and tracking ROS package and visialization tool(https://github.com/OUXT-Polaris/vision_msgs_visualization) using vision_msgs. However, there is no object unique id in this package. So,I want to modify this message like below.

Detection2D.msg

# Defines a 2D detection result.
#
# This is similar to a 2D classification, but includes position information,
#   allowing a classification result for a specific crop or image point to
#   to be located in the larger image.

Header header

# Class probabilities
ObjectHypothesisWithPose[] results

# 2D bounding box surrounding the object.
BoundingBox2D bbox

# The 2D data that generated these results (i.e. region proposal cropped out of
#   the image). Not required for all use cases, so it may be empty.
sensor_msgs/Image source_img

# If this message was tracking result, this field set true.
bool is_tracking

# Unique ID which was set by tracker ROS node.
uuid_msgs/UniqueID tracking_id

Detection3D.msg

# Defines a 3D detection result.
#
# This extends a basic 3D classification by including position information,
#   allowing a classification result for a specific position in an image to
#   to be located in the larger image.

Header header

# Class probabilities. Does not have to include hypotheses for all possible
#   object ids, the scores for any ids not listed are assumed to be 0.
ObjectHypothesisWithPose[] results

# 3D bounding box surrounding the object.
BoundingBox3D bbox

# The 3D data that generated these results (i.e. region proposal cropped out of
#   the image). This information is not required for all detectors, so it may
#   be empty.
sensor_msgs/PointCloud2 source_cloud

# If this message was tracking result, this field set true.
bool is_tracking

# Unique ID which was set by tracker ROS node.
uuid_msgs/UniqueID tracking_id

I want to use same message for tracking and detection, in order to use same ROS node for the visialization, so I do not want to add new message type.

Kukanani commented 5 years ago

Hi, thanks for proposing a change! I agree that right now it is difficult to perform multi-object tracking using only vision_msgs so let's see if we can improve it.

Can you please clarify what the ID keeps track of in your implementation? Does each real-world object in the scene get its own UniqueID? Each detection message? Each image/detection event?

hakuturu583 commented 5 years ago

I think pipelines like below is good for building object detection/tracking system with vision_msgs. (https://docs.google.com/presentation/d/e/2PACX-1vTUnCyBg9Kxuya1BWyjTJcAf9lGJa_th5n-Ta2JxoHbiwgZS5lHWjOhvDsvzqMJB9s1Rey9wgOh59Md/pub?start=false&loop=false&delayms=3000)

hakuturu583 commented 5 years ago

I think it is unnecessary to set unique ID in object detector. In order to setting unique ID, time series processing is necessary. I think detection node is one-shot algorithm such as yolo,voxelnet,SVM etc... Tracking node contains time series processing algorithm such as particle filter, kalman filter... So, I think it is better for us to set unique tracking ID in tracking node.

hakuturu583 commented 5 years ago

In order to show the result is tracking result or detectino result cleary, I made is_tracking field.

mintar commented 5 years ago

+1

I agree this would be a valuable addition. The comments should be checked by a native speaker (@Kukanani ?), otherwise it looks good to me.

hakuturu583 commented 5 years ago

I also sending PR for this issue. #19 If you are OK. Pleas review it!!

hakuturu583 commented 5 years ago

@Kukanani How do you think about my architecture design and message modification?

Kukanani commented 5 years ago

@hakuturu583, I'm happy to move forward with getting this merged, but I will probably propose changes to the comments based on this discussion.

Just to clarify again: the tracking_id field should be the same across multiple messages, as long as those messages represent detections of the same real-world object. Is this correct?

hakuturu583 commented 5 years ago

@Kukanani Yes, I think tracking_id of the same object should be same across multiple message as long as the tracking node continue tracking the target object.

mistermult commented 5 years ago

I'm also interested in adding IDs.

Place of ID However, I'm no satisfied with the solution #19. The ID should stay the same if the detected object is the same, i.e. the same entity. However, currently the ID is added to Detection2D.msg. So I cannot model the following: Object 123 is detected with probability of 50% and Object 234 is detected with probablity 50%.

Solution: Move ID from Detection2D.msg to ObjectHypothesis.msg/ObjectHypothesisWithPose.msg. Moreover, rename from tracking_id to object_id, because it describes that this is the same object entity.

Type of ID See also https://github.com/Kukanani/vision_msgs/issues/17#issuecomment-509506083

Solution: Make the type of tracking_id/object_id and old id (=class id) string.

Both tracking_id and object_id should be of same type because they are foreign keys in similar collections. Class and entity are really similar: "Red cube" might be a unique object until there are multiple red cubes of exactly the same type on the table.

String:

UUID, Int:

Proposed solution

Keep Detection2D.msg/Detection3D.msg as it is (or only add is_tracking). Modify ObjectHypothesis.msg (ObjectHypothesisWithPose.msg) as following:

# An object hypothesis that contains no position information.

# The unique ID of object detected. 
# If the two object_id's are the same in different messages for different images,
# it means that the same real object (entity) has been detected in both images.
# Object detection pipelines that do not output such IDs should set this to the
# empty string "".
string object_id

# The unique ID of class of the object detected [....]
string id #Change to string; 

# The probability or confidence value of the detected object. By convention,
#   this value should lie in the range [0-1].
float64 score

This also has the benefit that a classifcation (Classification2D.msg) now also can give detected object a ID because it includes ObjectHypothesis.

Please review my proposal @hakuturu583 , @Kukanani , @LeroyR, @mintar. I added some up/down buttons below. You might just want to vote.

LeroyR commented 5 years ago

I understand where @mistermult is coming from, however:

Place of ID

The Detection it is still one entity(e.g. having one BB/Pointcloud) that we add to the planning scene, even if it has multiple hypothesis which specific class / track it is. Which object_id is then set in the planning_scene/collision_object? First Hypothesis? Hypothesis with the highest score? Set the same in all hypos? -> we need the field on the detectionXd level if using the object based on the pointcloud

I can imagine a system that uses the best hypothesis as basis for e.g. Manipulation, adding the known mesh/model in the assumed orientation, but it may be always safer to base Manipulation on the sensor data.

Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.

e.g with rosparam:

object_tracker:
    0: apple

i still think detectionXd is the correct place

Type of ID

Agree, as long as moveit is using strings.

mistermult commented 5 years ago

@LeroyR Thanks for the reply.

I have to clarify my proposal. My proposed object_id describes the object. Assume that each real object has a unique number written on (two identical red cubes would have different numbers). This would be the object_id. The object detection/tracking of course creates the numbers rarely arbitrarily.

Which object_id is then set in the planning_scene/collision_object? First Hypothesis? Hypothesis with the highest score? Set the same in all hypos?

We must differentiate between two thing:

Problem when using only the tracking_id in ClassificationX

Assume that:

Detection2D = [ 
detections: [
{
    results: [
        {
            id: "apple",
            object_id: 1, #the left cube
            score: 0.5,
            pose:...,
       },
       {
            id: "orange",
            object_id: 2, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
}]

Now assume that we use tracking_id in Classification2D instead of object_id in ObjectHypothesis. There are two potential IDs (1 or 2). So there must be two Classification2D. I conclude:

Detection2D = [ 
detections: [
{
    results: [
        {
            id: "apple",
            object_id: 1, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
},
{
    results: [
        {
            id: "orange",
            object_id: 2, #the left cube
            score: 0.5,
            pose:...,
       }
    ]
    bbox = ...,
    source_img = ...
}]

There are multiple problems:

Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.

Assume I track 2 apples. As clarified by @Kukanani, id describes the class. So id="apple" for both apples. To track the apples across multiple frames I have to identify each apple with a unique additional ID: the object_id. So I have (id="apple", object_id=1) for the left apple and (id="apple", object_id=2) for the right apple.

mistermult commented 5 years ago

In conclusion:

Advantages of object_id in ObjectHypothesis:

mintar commented 5 years ago

I'm currently on vacation and on mobile, so apologies beforehand for being brief.

I prefer track_id and class_id to be strings, for the reasons listed.

I also believe track_id should go into DetectionXd. We should not add object_id to ObjectHypothesis, IMO. Reason: track_id has a clearly defined meaning - it associates this DetectionXd to one from the previous frame. The examples cited by @mistermult go beyond tracking; this is called "anchoring" (Saffiotti et al.). I admit that the example with the two tracked objects assigned to one single detection cannot be modeled elegantly without adding object_id to the hypotheses; however, I would argue that this is a special case anyway. In general, we will anyway have to model it the way that @mistermult mentioned later: each (potential) object becomes one DetectionXd, with a unique track_id (if tracked); if there is uncertainty about the object class, that is modeled via the Object Hypotheses. If the tracker is using MHT (Multi Hypothesis Tracking) or equivalent, it is probably best to do that internally and only publish the most likely hypothesis (here: assignment of objects to tracks). Or it could publish each "hypothesis" (in the MHT sense of the word) as a separate DetectionXdArray if desired.

mintar commented 5 years ago

Example of why I think the example above is a special case: assume there are two real objects (apple and orange), and two detected objects (o1 and o2). Now assume the tracker doesn't know which one is which. If you simply model o1 = (apple|orange), o2 = (apple|orange), you don't express the constraint that the combination o1=apple, o2=apple is invalid. Much better to publish two separate DetectionXdArrays: o1=apple, o2=orange | o1=orange, o2=apple, or even just the most likely.

mistermult commented 5 years ago

@mintar The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays. But if the tracker cannot differentiate n objects, it would have to publish n! messages.

mistermult commented 5 years ago

I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:

Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.

mintar commented 5 years ago

The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays.

Yes. There is no need to add the score to DetectionXdArray. Simply add a message that has an array of DetectionXdArray and an array of scores. If you want to go down this road, I suggest you create such a message outside of vision_msgs. Once it has proved useful in a real implementation, it could be merged into vision_msgs.

But if the tracker cannot differentiate n objects, it would have to publish n! messages.

Correct. This is in the nature of the problem. Pretending that the probabilities are independent simplifies the problem, but is wrong. MHT solves the problem by only keeping track of a fixed number of hypotheses, not all possible ones, like a particle filter.

I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:

Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.

Agreed. We can still add object_id to the hypotheses later if we find a compelling and common use case.

Kukanani commented 5 years ago

See #19, now merged, for tracking on Detection messages. Please re-open the issue if and when we need to revisit.

hakuturu583 commented 5 years ago

@Kukanani @mistermult I feel verry sorry to my late reply. I strongly disagree with using string in tracking_id. So, I propose to revert change in #22 It makes it is unnecessary for humans to check tracking ID directory. It is just a visualization problem. I am now developing visualization nodes for this message. So, it will be no problem soon. If you want to treat tracking result from a multiple tracker nodes, we have to check the batting of the tracking ID.

hakuturu583 commented 5 years ago

@mistermult The reason why I use uuid_msgs is users should recoganize the filed is UUID corectly.

hakuturu583 commented 5 years ago

@Kukanani @mistermult @LeroyR I failed to reopen this issue. Can you discuss here?? #25