Closed hakuturu583 closed 5 years ago
Hi, thanks for proposing a change! I agree that right now it is difficult to perform multi-object tracking using only vision_msgs so let's see if we can improve it.
Can you please clarify what the ID keeps track of in your implementation? Does each real-world object in the scene get its own UniqueID? Each detection message? Each image/detection event?
I think pipelines like below is good for building object detection/tracking system with vision_msgs. (https://docs.google.com/presentation/d/e/2PACX-1vTUnCyBg9Kxuya1BWyjTJcAf9lGJa_th5n-Ta2JxoHbiwgZS5lHWjOhvDsvzqMJB9s1Rey9wgOh59Md/pub?start=false&loop=false&delayms=3000)
I think it is unnecessary to set unique ID in object detector. In order to setting unique ID, time series processing is necessary. I think detection node is one-shot algorithm such as yolo,voxelnet,SVM etc... Tracking node contains time series processing algorithm such as particle filter, kalman filter... So, I think it is better for us to set unique tracking ID in tracking node.
In order to show the result is tracking result or detectino result cleary, I made is_tracking field.
+1
I agree this would be a valuable addition. The comments should be checked by a native speaker (@Kukanani ?), otherwise it looks good to me.
I also sending PR for this issue. #19 If you are OK. Pleas review it!!
@Kukanani How do you think about my architecture design and message modification?
@hakuturu583, I'm happy to move forward with getting this merged, but I will probably propose changes to the comments based on this discussion.
Just to clarify again: the tracking_id field should be the same across multiple messages, as long as those messages represent detections of the same real-world object. Is this correct?
@Kukanani Yes, I think tracking_id of the same object should be same across multiple message as long as the tracking node continue tracking the target object.
I'm also interested in adding IDs.
Place of ID However, I'm no satisfied with the solution #19. The ID should stay the same if the detected object is the same, i.e. the same entity. However, currently the ID is added to Detection2D.msg. So I cannot model the following: Object 123 is detected with probability of 50% and Object 234 is detected with probablity 50%.
Solution: Move ID from Detection2D.msg to ObjectHypothesis.msg/ObjectHypothesisWithPose.msg. Moreover, rename from tracking_id to object_id, because it describes that this is the same object entity.
Type of ID See also https://github.com/Kukanani/vision_msgs/issues/17#issuecomment-509506083
Solution: Make the type of tracking_id/object_id and old id (=class id) string.
Both tracking_id and object_id should be of same type because they are foreign keys in similar collections. Class and entity are really similar: "Red cube" might be a unique object until there are multiple red cubes of exactly the same type on the table.
String:
UUID, Int:
Keep Detection2D.msg/Detection3D.msg as it is (or only add is_tracking). Modify ObjectHypothesis.msg (ObjectHypothesisWithPose.msg) as following:
# An object hypothesis that contains no position information.
# The unique ID of object detected.
# If the two object_id's are the same in different messages for different images,
# it means that the same real object (entity) has been detected in both images.
# Object detection pipelines that do not output such IDs should set this to the
# empty string "".
string object_id
# The unique ID of class of the object detected [....]
string id #Change to string;
# The probability or confidence value of the detected object. By convention,
# this value should lie in the range [0-1].
float64 score
This also has the benefit that a classifcation (Classification2D.msg) now also can give detected object a ID because it includes ObjectHypothesis.
Please review my proposal @hakuturu583 , @Kukanani , @LeroyR, @mintar. I added some up/down buttons below. You might just want to vote.
I understand where @mistermult is coming from, however:
The Detection it is still one entity(e.g. having one BB/Pointcloud) that we add to the planning scene, even if it has multiple hypothesis which specific class / track it is. Which object_id is then set in the planning_scene/collision_object? First Hypothesis? Hypothesis with the highest score? Set the same in all hypos? -> we need the field on the detectionXd level if using the object based on the pointcloud
I can imagine a system that uses the best hypothesis as basis for e.g. Manipulation, adding the known mesh/model in the assumed orientation, but it may be always safer to base Manipulation on the sensor data.
Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.
e.g with rosparam:
object_tracker:
0: apple
i still think detectionXd is the correct place
Agree, as long as moveit is using strings.
@LeroyR Thanks for the reply.
I have to clarify my proposal. My proposed object_id describes the object. Assume that each real object has a unique number written on (two identical red cubes would have different numbers). This would be the object_id. The object detection/tracking of course creates the numbers rarely arbitrarily.
Which object_id is then set in the planning_scene/collision_object? First Hypothesis? Hypothesis with the highest score? Set the same in all hypos?
We must differentiate between two thing:
Assume that:
Detection2D = [
detections: [
{
results: [
{
id: "apple",
object_id: 1, #the left cube
score: 1.0,
pose:...,
}
]
bbox = ...,
source_img = ...
},
{
results: [
{
id: "orange",
object_id: 2, #the left cube
score: 1.0,
pose:...,
}
]
bbox = ...,
source_img = ...
}]
This would also work with object_id/trackig_id in the message. Then I slowly move both fruits into the middle of the table. If the two objects touch, I remove one fruit quicly. The tracker is good at tracking but bad at classification. So it knows that there is one known fruit (so only one bounding box), which must be a apple or a orange. So its object_id is 1 or 2, but does not know the class. So it must emit:
Detection2D = [
detections: [
{
results: [
{
id: "apple",
object_id: 1, #the left cube
score: 0.5,
pose:...,
},
{
id: "orange",
object_id: 2, #the left cube
score: 0.5,
pose:...,
}
]
bbox = ...,
source_img = ...
}]
Now assume that we use tracking_id in Classification2D instead of object_id in ObjectHypothesis. There are two potential IDs (1 or 2). So there must be two Classification2D. I conclude:
Detection2D = [
detections: [
{
results: [
{
id: "apple",
object_id: 1, #the left cube
score: 0.5,
pose:...,
}
]
bbox = ...,
source_img = ...
},
{
results: [
{
id: "orange",
object_id: 2, #the left cube
score: 0.5,
pose:...,
}
]
bbox = ...,
source_img = ...
}]
There are multiple problems:
Also: On the Hypothesis level you can already use the current messages by simply using the id field as the track id, as you currently have to lookup the label anyway.
Assume I track 2 apples. As clarified by @Kukanani, id describes the class. So id="apple" for both apples. To track the apples across multiple frames I have to identify each apple with a unique additional ID: the object_id. So I have (id="apple", object_id=1) for the left apple and (id="apple", object_id=2) for the right apple.
In conclusion:
Advantages of object_id in ObjectHypothesis:
I'm currently on vacation and on mobile, so apologies beforehand for being brief.
I prefer track_id and class_id to be strings, for the reasons listed.
I also believe track_id should go into DetectionXd. We should not add object_id to ObjectHypothesis, IMO. Reason: track_id has a clearly defined meaning - it associates this DetectionXd to one from the previous frame. The examples cited by @mistermult go beyond tracking; this is called "anchoring" (Saffiotti et al.). I admit that the example with the two tracked objects assigned to one single detection cannot be modeled elegantly without adding object_id to the hypotheses; however, I would argue that this is a special case anyway. In general, we will anyway have to model it the way that @mistermult mentioned later: each (potential) object becomes one DetectionXd, with a unique track_id (if tracked); if there is uncertainty about the object class, that is modeled via the Object Hypotheses. If the tracker is using MHT (Multi Hypothesis Tracking) or equivalent, it is probably best to do that internally and only publish the most likely hypothesis (here: assignment of objects to tracks). Or it could publish each "hypothesis" (in the MHT sense of the word) as a separate DetectionXdArray if desired.
Example of why I think the example above is a special case: assume there are two real objects (apple and orange), and two detected objects (o1 and o2). Now assume the tracker doesn't know which one is which. If you simply model o1 = (apple|orange), o2 = (apple|orange), you don't express the constraint that the combination o1=apple, o2=apple is invalid. Much better to publish two separate DetectionXdArrays: o1=apple, o2=orange | o1=orange, o2=apple, or even just the most likely.
@mintar The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays. But if the tracker cannot differentiate n objects, it would have to publish n! messages.
I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:
Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.
The most correct version would be, that the tracker publishes multiple DetectionXdArrays if it is uncertain. However, in this case the score would have to be at DetectionXdArrays.
Yes. There is no need to add the score to DetectionXdArray. Simply add a message that has an array of DetectionXdArray and an array of scores. If you want to go down this road, I suggest you create such a message outside of vision_msgs. Once it has proved useful in a real implementation, it could be merged into vision_msgs.
But if the tracker cannot differentiate n objects, it would have to publish n! messages.
Correct. This is in the nature of the problem. Pretending that the probabilities are independent simplifies the problem, but is wrong. MHT solves the problem by only keeping track of a fixed number of hypotheses, not all possible ones, like a particle filter.
I see that I cannot find support for a object_id in ObjectHypothesis. I still thing this will bite us in the future. Nevertheless, I suggest that we are going forward with the majority:
Add track_id (which seems to be a better name that tracking_id) of type string to DetectionXd.
Agreed. We can still add object_id to the hypotheses later if we find a compelling and common use case.
See #19, now merged, for tracking on Detection messages. Please re-open the issue if and when we need to revisit.
@Kukanani @mistermult I feel verry sorry to my late reply. I strongly disagree with using string in tracking_id. So, I propose to revert change in #22 It makes it is unnecessary for humans to check tracking ID directory. It is just a visualization problem. I am now developing visualization nodes for this message. So, it will be no problem soon. If you want to treat tracking result from a multiple tracker nodes, we have to check the batting of the tracking ID.
@mistermult The reason why I use uuid_msgs is users should recoganize the filed is UUID corectly.
@Kukanani @mistermult @LeroyR I failed to reopen this issue. Can you discuss here?? #25
Hi, I want to make object detection (https://github.com/OUXT-Polaris/nnabla_vision_detection) and tracking ROS package and visialization tool(https://github.com/OUXT-Polaris/vision_msgs_visualization) using vision_msgs. However, there is no object unique id in this package. So,I want to modify this message like below.
Detection2D.msg
Detection3D.msg
I want to use same message for tracking and detection, in order to use same ROS node for the visialization, so I do not want to add new message type.