Closed mintar closed 3 years ago
- Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly
pedantic, but doesn't this depend on the needs of the human in question? Putting UUIDs as string
s in a field loses semantics, which is certainly a negative consequence of doing that. Introspection tools would have to parse the field to figure out it's a UUID. Some developers might consider that quite a loss.
Human readability can certainly be positive, but it doesn't seem objective to state in general it's better -- machines would likely disagree (if they could, but then they might also be able to understand human readable fields ;) ).
@gavanderhoorn Can you come up with a scenario where switching the tracking_id
field to uuid_msgs/UniqueID
is an advantage? I've tried to describe such a scenario above, and why I think it's easy to handle without forcing the use of UUIDs.
BTW, I grant you the point that declaring something as "much more user friendly" wasn't very objective of me. :)
BTW, I grant you the point that declaring something as "much more user friendly" wasn't very objective of me. :)
that was mostly it. As I said: it was pedantic.
I do believe it's important to keep in mind that human-readability doesn't seem like something which would be too important in an actual application. During development: certainly. Afterwards? Depending on what the application is, there may never be any human eyes looking at those messages.
I absolutely agree. If somebody can convince me that requiring UUIDs has significant advantages in this use case, it might be worth sacrificing the advantages of permitting arbitrary strings. It's always a tradeoff.
ObjectHypothesisWithPose.id: This is the detected class (human, mug, ...). I think we can all agree that this should remain a string.
Agreed, though there's been an argument made in other msgs
to have classes be enums where the first 0-N/2
are reserved for ROS standardization that we could create a list of class objects for enums to belong to. Then N/2 - N
would be reserved for user-specific classes. That's what we agreed to for the radar_msgs
. If that sounds interesting, we can talk about that too, but I was not planning on that being a UUID. The enum int
s also makes alot of sense because AI libraries don't output dog
, they output 7
which you have a file that you can use to remap to dog
. So using ints actually seems like it would be a very natural way to do things (though maybe you don't want downstream nodes to have to know that remapping, so it could be useful to do it for them).
The single detections (without tracking across frames) don't have an ID.
Where is that represented in the messages? I would think they should still have an ID field (of type whatever we decide for the tracking_id
). Lacking a track shouldn't negate the information that a raw detection has to offer.
On UUID for tracking_id
in the Detection*D
messages:
Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly; apple-3456 is easier on the eyes than ec1cf114-7aa0-4402-8bf9-eee858118b7d when used in log outputs, RViz markers etc.
Isn't that what the ID string is for in the actual ObjectHypothesis
? I don't see a rationale for having both be non-standardized non-unique strings.
Perhaps there's not just a one-way flow of information from object detectors into the environment model, but perhaps the object detector re-aquires an object that is already known (by some ID) in the environment model and continues to track it. Having a string here gives us the flexibility of using the existing ID for tracking, no matter what format it is in.
This is sending some alarm bells in my mind. Why is the Detection*N
messages having tracks at all? They're literally called Detection
s. This should not be dependent on tracks. While trackers might use the Detection message, this is pretty useless as a tracking message in all honesty because it doesn't contain any velocity (or acceleration) information, or uncertainty measures. I think the concept that Detection*N
is a 'track' should be dismissed and we should only be talking about detections for the design of it. And with detections, then there's no reason that tracking_id
should be named tracking_id
(it can just be uuid
or id
) because its just uniquely identifying 1 detection of potentially many detections in a given image or pointcloud.
ObjectHypothesisWithPose[]
is a little strange to me. I understand what its going for, it could be useful to output not just the actual class of the object of interest, but the confidences of all of the the other classes it could have been. However, why is there not a message parent-level ID for what the highest probably class is (e.g. answering the question "what is this object")? That could even be something as simple as a int
for the index in the results
array to the highest, or a separate ObjectHypothesisWithPose
instance with just the highest. It seems strange to me to have to iterate through a list of potentially hundreds of classes to find out the only highest value that people most likely want. This makes me think that the resistance on the ID to a string is because people are using tracking_id
to embed the actual class information! We can separate those ideas and allow best of both worlds: a parent-level class
field and a unique id
.
Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly
Agreed for classes of objects, but instances of tracks or instances of detections in an image have no physical or semantic meaning. If this is a "track" or image then there might be 20 of them per image frame and 100,000 image frames per run. There's no semantic or unique meaning that can be really gained here meaningfully. What seems like a more likely thing is that users are abusing this field that's supposed to uniquely identify the unique detection in an image frame with the class of that detection. It would be much better to simply offer a class
string for the highest probability class as I suggest above and let the id
be what its supposed to be, a unique identifier to separate N
detections in an image from each other.
Regardless if we change tracking_id
to uuid, I think that name must go and should be id
or uuid
if we go with uuids.
tl;dr my position
class
of the highest probability detection without having to iterate over a results array of potentially 100's of classestracking_id
-> id
, the word "track" is being abused I believe. This is not a Track2D
message, its a Detection2D
message. id
to type uuid
because now there's no value in being human readable with an easy class
parent-level field. human34
is no different than class: "human", id: 4324-3432-243-432
. You may concat them if you like if you want something in the <class><unique numbe>
format.is_tracking
, I don't see how that could be used, there isn't any tracking information in the message to be used as a tracker, but open to hearing more about why that's hereTL; DR:
tracking_id
to object_id
or even just id
.tracking_id
should remain optional; stateless object detectors that just work frame-by-frame should leave it empty.string
, so that keys in an environment model/database can be used here.results
array be sorted in descending order by their score
.id
field to type
or classification
or something similar. class
would be perfect, but then it won't compile in rosjava (been there...) because every Java class already has a class
member.is_tracking
field. If tracking_id
is the empty string, that means that the ID is invalid; no need for an extra boolean flag.Agreed, though there's been an argument made in other
msgs
to have classes be enums where the first0-N/2
are reserved for ROS standardization that we could create a list of class objects for enums to belong to. ThenN/2 - N
would be reserved for user-specific classes. That's what we agreed to for theradar_msgs
. If that sounds interesting, we can talk about that too, but I was not planning on that being a UUID. The enumint
s also makes alot of sense because AI libraries don't outputdog
, they output7
which you have a file that you can use to remap todog
. So using ints actually seems like it would be a very natural way to do things (though maybe you don't want downstream nodes to have to know that remapping, so it could be useful to do it for them).
I think it's better to keep this a string. Incidentally, this field was an int64
before, and based on actual use cases we changed it to string
, which turned out to be very useful. True, detection libraries output ints, not strings, but the remapping to the actual class is very dataset-dependent, so you'll have to do some remapping anyway (int to int instead of int to string).
Also, what class a given integer corresponds to often evolves over time when you add/remove classes and retrain your network. This will lead to problems when the messages (e.g. from a months-old rosbag) and the remapping information stored elsewhere go out of sync. Putting the class as a string into the message avoids this (or at least makes it easier to detect, e.g. when the message contains a class that was renamed/removed in the meantime).
The single detections (without tracking across frames) don't have an ID.
Where is that represented in the messages? I would think they should still have an ID field (of type whatever we decide for the
tracking_id
). Lacking a track shouldn't negate the information that a raw detection has to offer.
I'll respond to that below.
On UUID for
tracking_id
in theDetection*D
messages:Having the possibility of having human-readable strings as tracking_ids is much more user and developer friendly; apple-3456 is easier on the eyes than ec1cf114-7aa0-4402-8bf9-eee858118b7d when used in log outputs, RViz markers etc.
Isn't that what the ID string is for in the actual
ObjectHypothesis
? I don't see a rationale for having both be non-standardized non-unique strings.
Ok, that's true. One could easily construct a "human-readable" string for logs or for an RViz text marker from the class ID plus (let's say) the last 4 chars of the UUID. My other argument (using an ID that's referenced elsewhere, like in an environment model) still holds.
Perhaps there's not just a one-way flow of information from object detectors into the environment model, but perhaps the object detector re-aquires an object that is already known (by some ID) in the environment model and continues to track it. Having a string here gives us the flexibility of using the existing ID for tracking, no matter what format it is in.
This is sending some alarm bells in my mind. Why is the
Detection*N
messages having tracks at all? They're literally calledDetection
s. This should not be dependent on tracks. While trackers might use the Detection message, this is pretty useless as a tracking message in all honesty because it doesn't contain any velocity (or acceleration) information, or uncertainty measures. I think the concept thatDetection*N
is a 'track' should be dismissed and we should only be talking about detections for the design of it. And with detections, then there's no reason thattracking_id
should be namedtracking_id
(it can just beuuid
orid
) because its just uniquely identifying 1 detection of potentially many detections in a given image or pointcloud.
Ok, perhaps the word "track" is not very well-chosen, so perhaps let's rename it to object_id
.
My view on this is the following: The tracking_id
uniquely identifies a certain object in the scene. This ID should be consistent across several frames. At that point, you're doing some rudimentary tracking (even if it's as basic as "the bounding boxes of these two subsequent detections overlap more than 90%"). I agree that any more sophisticated tracking information (velocity, acceleration, ...) has no place in a DetectionXD
message.
If the ID had to be unique for every single detection and carried no semantic information at all, it would be pretty useless. What's the point in uniquely identifying "the third element of the Detection3DArray
that was published at time 1604410630.213572502"? If you reference such a UUID in a different topic, it would mean that the receiver of the message would have to additionally subscribe to the Detection3DArray
topic and go through the messages until it finds the UUID that you're talking about. At that point, it would be much better to just copy the respective Detection3D
into your own message. Also, I can't find any other ROS messages that do it that way.
Also, I believe that the purpose of a Detection3DArray
is to describe a potential scene configuration: a set of objects that can explain the scene. This will usually come directly from an object detector, but I strongly believe that the message should be general enough to support further filtering/enrichment from other nodes, which could involve an environment model. For these purposes, having an ID that uniquely identifies an object is essential.
ObjectHypothesisWithPose[]
is a little strange to me. I understand what its going for, it could be useful to output not just the actual class of the object of interest, but the confidences of all of the the other classes it could have been. However, why is there not a message parent-level ID for what the highest probably class is (e.g. answering the question "what is this object")? That could even be something as simple as aint
for the index in theresults
array to the highest, or a separateObjectHypothesisWithPose
instance with just the highest. It seems strange to me to have to iterate through a list of potentially hundreds of classes to find out the only highest value that people most likely want. This makes me think that the resistance on the ID to a string is because people are usingtracking_id
to embed the actual class information! We can separate those ideas and allow best of both worlds: a parent-levelclass
field and a uniqueid
.
My proposal: Require that the results
array must be sorted in descending order of score
. That way you can just get the first element if you're only interested in the highest score. Incidentally, that's what I did in one of my projects, but it's not codified in the messages yet.
[...] What seems like a more likely thing is that users are abusing this field that's supposed to uniquely identify the unique detection in an image frame with the class of that detection. It would be much better to simply offer a
class
string for the highest probability class as I suggest above and let theid
be what its supposed to be, a unique identifier to separateN
detections in an image from each other.
As I said above: The detections within a DetectionXDArray
are already identifed by their array index and don't need an extra ID. We don't put the class into the tracking_id
(that would be abuse); instead, we use it to identify a given object. I wouldn't say that's abuse; it's exactly what this field is for.
- Potentially remove
is_tracking
, I don't see how that could be used, there isn't any tracking information in the message to be used as a tracker, but open to hearing more about why that's here
I don't see why it's there either. I believe that's because tracking_id
was a UUID in the original proposal (#18), and it's a bit awkward to check if all 16 bytes of the UUID array are 0; it's one more for
loop that just clutters the code, so that's probably why the author included the is_tracking
flag. When we changed the type from UUID to string, we just forgot to remove the is_tracking
field.
TL; DR:
rename tracking_id to object_id or even just id.
tracking_id should remain optional; stateless object detectors that just work frame-by-frame should leave it empty.
keep it a string, so that keys in an environment model/database can be used here.
require that the results array be sorted in descending order by their score.
rename the old id field to type or classification or something similar. class would be perfect, but then it won't compile in rosjava (been there...) because every Java class already has a class member.
rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.
OK, so agreement on the tracking_id
-> id
, we're getting somewhere.
require that the results array be sorted in descending order by their score.
That seems that too nitty-gritty of a thing to require, the outputs of alot of systems don't promise that so you'd be forcing people to own a sorting step.
rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.
To what?
rename the old id field to type or classification or something similar.
In which message, id
doesn't exist for Detection*D
, so I assume you mean object hypothesis. What's wrong with ID in that context? ID, class, and type are all interchangable in that context. I didn't think anyone had an issue with that.
This ID should be consistent across several frames.
UUIDs perfectly meet that need as well. And by your own comment The tracking_id uniquely identifies a certain object in the scene
, emphasis on unique, if this is not a UUID or similar random generation, there's no possible way you can promise uniqueness.
A specific recommendation (minus comments):
Header header
ObjectHypothesisWithPose[] results
BoundingBox2D bbox
sensor_msgs/Image source_img
bool is_tracking
string id
string class
tracking_id
-> id
class
field so that you don't have to iterate through results
to get the class type. Then need to iterate if you care about what else it could have been or metadata like confidences. Class could be type or something else if you like for the java issues, that's fine. I think you understand what I'm trying to propose though.I still feel that id
should be uuid
especially when you have a separate parent level class
. Now its much easier for you to uniquely identify tracks / detections and also have a human-readable name via the class
. I think this is really the best compromise since it meets your core complaints about not being human-readable and ensures that something in this message is uniquely identifiable, regardless of the actual class
type. It also fixes an oversight that the message doesn't provide an easy way to find the class ID of the detection easily.
OK, so agreement on the
tracking_id
->id
, we're getting somewhere.
Yup.
require that the results array be sorted in descending order by their score.
That seems that too nitty-gritty of a thing to require, the outputs of alot of systems don't promise that so you'd be forcing people to own a sorting step.
Yeah, it sounds like something many people would mess up.
rename the is_tracking field. If tracking_id is the empty string, that means that the ID is invalid; no need for an extra boolean flag.
To what?
Sorry, I meant "remove the is_tracking
field", if we're keeping strings. If we switch back to UUID, we'll probably need it (see below).
rename the old id field to type or classification or something similar.
In which message,
id
doesn't exist forDetection*D
, so I assume you mean object hypothesis. What's wrong with ID in that context? ID, class, and type are all interchangable in that context. I didn't think anyone had an issue with that.
Yes, I'm talking about ObjectHypothesis
. I think it's very confusing to have two fields called id
, even if they are at different levels of the hierarchy. There have been a couple of issues here that came just from the confusion, and I myself mix it up regularly because I find id
to be a very unintuitive name for "class". Let's rename it so it's clear which one is the object ID and which one's the class.
This ID should be consistent across several frames.
UUIDs perfectly meet that need as well. And by your own comment
The tracking_id uniquely identifies a certain object in the scene
, emphasis on unique, if this is not a UUID or similar random generation, there's no possible way you can promise uniqueness.
UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?
Also note that UUIDs and strings are equal with regards to local uniqueness. If a publishing node is so buggy that it accidentally re-uses identifier strings, it's probably also buggy enough to re-use UUIDs (or not fill them at all).
Anyhow, after giving this more thought, I think I'm okay with using UUIDs. To recap, my main arguments pro-string were:
tracking_id
field should probably not be used for this.I still don't really see the advantages of using UUIDs, but at least the disadvantages seem to be mitigated.
So if you really feel so strongly about using UUIDs, go ahead and do it, if @Kukanani agrees with you. I assume he's still a co-maintainer of this repo?
A specific recommendation (minus comments):
Header header ObjectHypothesisWithPose[] results BoundingBox2D bbox sensor_msgs/Image source_img bool is_tracking string id string class
* `tracking_id` -> `id` * Added `class` field so that you don't have to iterate through `results` to get the class type. Then need to iterate if you care about what else it could have been or metadata like confidences. Class could be type or something else if you like for the java issues, that's fine. I think you understand what I'm trying to propose though.
If you can add message comments for id
and class
(and also rename the class
field due to the Java issue), I think we're getting close to a PR. I think the message comment especially for the id
field should clarify exactly what exactly the semantics of that field is. In my mind, it's like a tracking ID. In other words, if the id
s of two DetectionXD
s from different DetectionXDArray
s match, then they are detections of the same object in the scene. The opposite isn't true: Just because two id
s are different, that doesn't mean that they are two different objects. Tracking will get lost, so an object will be re-assigned a new id
.
I think the id
in ObjectHypothesis
should be named something similar to class
(but not exactly "class" because of Java), and the class
here should then be called something like "most likely class", but shorter. You're the native speaker, maybe you can come up with some nice and crisp names for these things.
Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)? If id
is a string
, then I'd expect the publishing node to simply leave id == ""
, which can easily be detected by the subscribing node. However, if we switch back to UUIDs, then what? Also just leave it unset? That means that the subscribing node will have to iterate over the 16-bit array to check if all elements are 0. The runtime is negligible, but it clutters up the code. Set it to a new random UUID? That works, but it drops the information that the publishing node wasn't doing any tracking at all; it just looks like it's very bad at tracking, since for every detection it opens a new track and never resumes it.
I think that's where the is_tracking
flag came from. It doesn't make sense when using strings, but it makes sense when using UUIDs.
Last point: If we're pulling most_likely_class
out of the ObjectHypothesisWithPose
, we also need to pull out most_likely_pose
(at least for Detection3D
). At that point, why not also pull out most_likely_score
? Or just ObjectHypothesisWithPose most_likely
? (I don't like my names, but you get the point).
so...
is_tracking
tracking_id
to id
id
in object hypothesis to class
or type
(object_id
)class
in root of message for easy access. Actually, maybe we should leave object hypothesis alone and just make tracking_id
to type
so nothing changes there.
UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?
Easily. I have 3 cameras on a robot and I'm running CNNs on their images in 3 instances of a CNN node. I'm using those detections to tracking moving obstacles in the scene correlating AI detections to their 3D detections in the robot's environmental model. I want global uniqueness so that each detection output isn't mixed in with the others. This is actually an exact example of what we're doing right now in navigation2_dynamic
.
So if you really feel so strongly about using UUIDs, go ahead and do it, if @Kukanani agrees with you. I assume he's still a co-maintainer of this repo?
Yes of course, I'm just getting the ball rolling
So what are you using id
in the Detection2D
for if class
has the class type? Can you give me some examples of what you currently put into tracking_id
that isn't just class information?
Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)?
... it then should be a uuid so its globally unique :smile:
Last point: If we're pulling most_likely_class out of the ObjectHypothesisWithPose, we also need to pull out most_likely_pose (at least for Detection3D). At that point, why not also pull out most_likely_score? Or just ObjectHypothesisWithPose most_likely? (I don't like my names, but you get the point).
Detection*D
doesn't have a ObjectHypothesisWithPose
, it has ObjectHypothesisWithPose[]
of all of the scores of all of the classes in the detector. That can stay as is. We're just adding the info of the highest probability class directly to root so that you don't have to iterate though a potentially hundreds long list to get what you really wanted: the most likely one. We could instead create a message level ObjectHypothesisWithPose
to replace type/id/class
from the Detection*D
so that its the most likely type + its score without iterating through results
.
so...
- remove
is_tracking
tracking_id
toid
id
in object hypothesis toclass
ortype
(object_id
)class
in root of message for easy access.
Not object_id
(we're talking about the class here, not the object). Otherwise agreed!
Actually, maybe we should leave object hypothesis alone and just make
tracking_id
totype
so nothing changes there.
I didn't get that. Are you proposing to rename tracking_id
to type
? That would be wrong.
UUIDs guarantee global uniqueness, i.e., IDs from multiple nodes won't collide. With a string field, we'll still have local uniqueness, i.e., the IDs from a single node publishing on a specific topic won't collide. Can you come up with a scenario where it's necessary to have global uniqueness?
Easily. I have 3 cameras on a robot and I'm running CNNs on their images in 3 instances of a CNN node. I'm using those detections to tracking moving obstacles in the scene correlating AI detections to their 3D detections in the robot's environmental model. I want global uniqueness so that each detection output isn't mixed in with the others. This is actually an exact example of what we're doing right now in
navigation2_dynamic
.
So that's the exact same scenario as I outlined in the body of this issue. The problem that track IDs from different detector nodes can collide can easily be solved by keeping the outputs of the 3 detectors separate in the aggregator node. You should probably do that anyway, because:
There's a huge difference between the following two scenarios:
In case (1), you know there are at least two distinct mugs, and you shouldn't try to match them. In case (2), it's possible that there are either one or two mugs, so matching should be possible.
In short: Any implementation that just jumbles all object detections together is probably suboptimal. If we don't throw them all together, we don't need UUIDs. But also UUIDs don't hurt, so I'm okay with them.
So what are you using
id
in theDetection2D
for ifclass
has the class type? Can you give me some examples of what you currently put intotracking_id
that isn't just class information?
Oh, now I get you. Yes, currently the IDs are just <class name> + <counter>
. But I don't parse the class name back out of that string; that would be abuse. They could really be just about anything, as long as they are unique.
In one application, I have a node that takes in Detection3DArrays from an object detector, matches the objects with an existing environment model and then publishes a new Detection3DArray where the IDs are the object IDs from the environment model. But during our discussion I've come to realize that this is probably not the correct way to use the ID field; it should only contain tracking information. So I'm okay with UUIDs now.
Another thing we should discuss is what happens if a node doesn't do tracking at all (which will probably be true in most cases)?
... it then should be a uuid so its globally unique :smile:
Not an answer to my question. :smile:
If no tracking is taking place, we don't need an ID at all. It might be useful information to an aggregator node if it should even attempt to match IDs to previous detections, or if that's pointless because the publishing node always assigns new random IDs.
Last point: If we're pulling most_likely_class out of the ObjectHypothesisWithPose, we also need to pull out most_likely_pose (at least for Detection3D). At that point, why not also pull out most_likely_score? Or just ObjectHypothesisWithPose most_likely? (I don't like my names, but you get the point).
Detection*D
doesn't have aObjectHypothesisWithPose
, it hasObjectHypothesisWithPose[]
of all of the scores of all of the classes in the detector.
Small nitpick: It can be all the classes in the detector, but it can also be just a subset (such as the ones with non-zero scores). If the detector only outputs a single class, it can also just be a single-element array.
That can stay as is. We're just adding the info of the highest probability class directly to root so that you don't have to iterate though a potentially hundreds long list to get what you really wanted: the most likely one. We could instead create a message level
ObjectHypothesisWithPose
to replacetype/id/class
from theDetection*D
so that its the most likely type + its score without iterating throughresults
.
Yes, I got that and I fully agree that it's useful to have direct access to the most likely class. All I'm saying is that it's equally useful to have direct access to the corresponding pose, otherwise you're iterating through the results array again. And continuing that thinking, maybe you also want to have the corresponding score.
Thanks for hashing out the details of this issue. Given our running list of changes, as I understand them:
* remove `is_tracking`
No objections. I have never used this field, and I've never seen of anyone else using it. If anything this should probably be metadata in VisionInfo
(not proposing that change right now, just pointing out that if someone wants this field back, let's talk about adding it to VisionInfo).
* `tracking_id` to `id`
No objections.
* `DetectionXD/id` (formerly `DetectionXD/tracking_id`) to a UUID rather than a string.
Sorry, but I still object.
Perhaps it's just that I've never been deep enough into the development of a multi-detector, multi-object tracker, but I still don't understand the need to have every single detection have a globally unique identifier. I don't deny that this would be useful...just that I have never personally wanted it. But I don't understand @SteveMacenski 's use case. In particular, I would like clarification on what "global uniqueness so that each detection output isn't mixed in with the others" means. How does having a globally unique field on every single detection message help differentiate between three different detectors? Or is the point that each detector/detection algorithm supplies the same UUID to all of its detections? Please help me understand this.
uuid_msgs
is an additional upstream dependency. It is also a bit hard to get good details on. I believe that this is the correct reference. However, a Google search for uuid_msgs github
pushes that link to the 5th spot. The wiki page, which is the first hit, is not helpful, and this appears to be another "false trail" that appears earlier in the search results. This is not seem like a developer-friendly dependency.
I agree with @gavanderhoorn 's point that we lose semantics by using a string
rather than a UUID. However, there is extra work required (potentially an extra dependency) to generate UUIDs, including adding a dependency. We also have boost::uuids::uuid fromHexString(std::string)
as a standard conversion interface, but that seems to be a brittle function based on input (read function comments at above link). I could go on with this line of thinking (string vs UUID), but I don't want to continue any further down this line of thinking without knowing exactly what the proposed purpose of the UUID is (what needs to be differentiated from what, and why?)
* `id` in object hypothesis to `class` or `type` (`object_id`)
Complete with the merging of #48, so this can be dropped from discussion
* class in root of message for easy access.
I begrudgingly agree with this one...or at least the idea of it.
I definitely understand the need to get the top class (what we are proposing as id
) from a probabilistic detector that outputs probabilities for many classes. Unfortunately, storing the same information in two places is a violation of DRY and sets off alarm bells in my head. This "top class" and the list of individual detections could easily get out of sync. I'm not aware of any other messages that use this technique. Again we would depend on people to carefully read the message comments (which I think rarely happens) to know how to construct the message correctly.
I propose a couple of options here (numbered for easy reference)
class
, we could use int index_of_best_hypothesis
(long name that gets the point across). This allows the entire best hypothesis to be looked up, including its score. From my experience, this is a common need: you want to know the top class, but you also want to know how likely this top class is. Without indexing into the results, you'd have to sort/search the whole list of results again anyway to get the score. This option has a benefit, which is that the default value of 0 will work perfectly if the results list is sorted by score, without any extra work. It also has a drawback in that if results
is not sorted and this field is not set, it's worse than uselessclass
, we could provide ObjectHypothesisWithPose top_result
. This gives the same ability as above (get the score and class of top result). It also is flexible, and we can make it clear in the comments: A single top result can be given in top_result, multiple results can be given with no top_result, or both can be provided for convenience.
id
in object hypothesis toclass
ortype
(object_id
)Complete with the merging of #48, so this can be dropped from discussion
No, that wasn't changed in #48. I've created a new pull request that does this: #52.
I think we've settled that we're just going to leave it the same as it is with the new name.
As requested by @SteveMacenski, here's a new issue for continuing the discussion around UUIDs. As a starter, I'm going to summarize my comment from here: https://github.com/ros-perception/vision_msgs/issues/43#issuecomment-743068675
Main Question
Should we change the type of
Detection*D/tracking_id
touuid_msgs/UniqueID
?Background
Previous discussion has happened in #25 and #43.
As a basis for this discussion, we'll use the ROS2 version of the messages from the
ros2
branch. In Noetic and ROS2, there are two ID fields (see below):ObjectHypothesisWithPose.id
: This is the detected class (human
,mug
, ...). I think we can all agree that this should remain a string.tracking_id
: This is the ID of the object when tracked across multiple frames. The discussion is whether we should make this a UUID.Detection*DArray
message and its array index within that message.Detection3D.msg
Pros and Cons
The arguments for and against UUIDs boil down to:
Pro UUID:
Detection*D
message has to ensure that thetracking_id
is locally unique.Contra UUID:
apple-3456
is easier on the eyes thanec1cf114-7aa0-4402-8bf9-eee858118b7d
when used in log outputs, RViz markers etc.Presumably, if you run multiple object detector nodes, they will all publish their results on separate topics. If you are worried about collisions, the aggregator node could simply prefix each
tracking_id
with the topic name, which would guarantee unique strings. In my view, this negates the only advantage of UUIDs. So I'm also opposed to changing this field to a UUID.