How do you handle multiple action labels for one box?

foolhard commented 1 year ago

Hello,

I found out some boxes have multiple action labels or multiple loc labels. e.g. _2014-07-14-14-49-50_stereo_centre01 / No. 56 frame, b63925 has two action labels: 3, 21, which means "MovAway" and "PushObj". Thus this box has two triple_labels. Similarly, some boxes have multiple location labels.

I have some questions about this multiple labels case: **1. Is it correct that one agent has two actions simultaneously?

How should I organize the result for this box in submission file？ How do you evaluate the result?**

Thanks.

mihaela-stoian commented 1 year ago

Hi @foolhard,

Indeed, you are right that some ground truth boxes have multiple action/location labels. For this challenge, you will be evaluated only on the fields: "agent_ids", "action_ids" and "loc_ids". Thus, you can ignore the duplex and triplet annotation, as mentioned on the ROAD-R challenge website.

You can find the details on how we evaluate the submitted predictions under the "Evaluation" section here.

Additionally, you can find an example of how the submission file should be organised under the "Submission guidelines" section. So, if you are working on task 1, you will have to submit a score for each label (i.e. all agents, actions and locations) for each bounding box; and if you are working on task 2, you will have to submit only the label ids of the positive labels: so if you have multiple actions that your model predicted for a bounding box, you'll need to list all of them.

Please let us know if you have other questions!

Kind regards, Mihaela

foolhard commented 1 year ago

Hi @mihaela-stoian ,

Thanks for your comprehensive explanation.

I have a following question:

If the box has two action IDs as below, how should I organize the results in the Task 2 submission file?　

Best regards

mihaela-stoian commented 1 year ago

Hi @foolhard,

For Task 2, you would need to organise your submission using the following format:

{ \<videoname>: { \<frame>: [ 'bbox': bounding box coordinates in 2-points format in absolute image coordinates, as mentioned under the "Submission guidelines" here 'labels': [1, 19, 22, 29] ] } }

You only need "agent_ids", "action_ids" and "loc_ids" to determine the 'labels' list above. When the neural network model predicts the label scores, it does so for all the agents, actions and locations, concatenated in this order. The reason why 'labels' in the submission format corresponds to [1, 19, 22, 29] is due to this label concatenation. More specifically, "agent_ids" was [1], so [1] will be added to the 'labels' list. However, "action_ids" was [9, 12], and so in the 'labels' list, this will correspond to [9+num_agents, 12+num_agents] = [19, 22], as num_agents is 10. Finally, "loc_ids" was [0], so in the 'labels' list, this will correspond to the final entry of 29 (calculated as 0+num_agents+num_actions, where num_actions is 19). Please note that the code does this automatically, so you don't have to worry about the label concatenation or the shift in the label IDs.

You can find a mapping between each label name and its ID under the "Labels" section here.

Kind regards, Mihaela

foolhard commented 1 year ago

Hi @mihaela-stoian

That is very clear and I understand your settings. Thanks for the detailed explanation.

mihaela-stoian / ROAD-R-2023-Challenge

How do you handle multiple action labels for one box? #2