about dataset collective activity

cissoidx commented 2 years ago

Hi @hongluzhou ,

Thanks for your kind open sourcing the composer code. I downloaded the collective activity dataset from the link in your README. I am not sure if I understand what the annotation actually means. Can you please correct/add the description here below?

annotations.pkl: dict which is like {video_index: sub_dict}. sub_dict is like {index_of_every_10_frames: sub_sub_dict}, sub_sub_dict has dict_keys(['frame_id', 'group_activity', 'actions', 'bboxes']). NOT sure of what the 'group_activity', 'actions', 'bboxes' really refer to.
joints: sub folders are video_index whose items are index_of_every_10_frames.pickle. The pickle file contains a dict like {frame_index: value_item} the value_item is of shape (13,17,3). NOT sure of what this value_item is.
tracks_normalized.pkl: dict whose keys are like (video_index, index_of_every_10_frames), and whose values are sub_dicts. sub_dict is like {index_of_frame: value_item}. the value_item is of shape (13, 4). NOT sure of what this value_time is.
tracks_normalized_with_person_action_label.pkl: similar to tracks_normalized.pk, but the final value_item is of shape (13, 1). NOT sure of what this value_time is.
videos: the 44 videos, grouped by 10 frames.

cheers, xu

hongluzhou commented 2 years ago

Sure! The Collective Activity's data format is mostly the same as Volleyball's: https://github.com/hongluzhou/composer/blob/main/DATA_README.txt

To answer your questions, specifically:

annotations.pkl: only the 'group_activity' field is used, and you can ignore other fields (because other fields such as 'actions' and 'bboxes' were not used, I don't quite remember their exact meaning and how I obtained them). Right now the 'group_activity' field is the integer group activity ID. To find the mapping between the group activity integer ID and its string name, please refer to https://github.com/hongluzhou/composer/blob/dbe5155391b5a2eea7f4c146192c9957ca323c42/datasets/collective.py#L132 (so the mapping is basically 0:'Walking', 1: 'Waiting', 2: 'Queueing', 3: 'Talking').
joints: the value_item in shape (13,17,3), that is the keypoint/joint/skeleton data of this clip. Please refer to https://github.com/hongluzhou/composer/blob/dbe5155391b5a2eea7f4c146192c9957ca323c42/DATA_README.txt#L21 . The first axis is the person index axis, and in the entire Collective Activity dataset, a clip can maximally have 13 persons (when a clip has less than 13 persons, zero padding was used). The second axis is the COCO keypoint index, and we have 17 keypoints/joints per person. The last index is [x_coord, y_coord, joint_type_class_id] where x_coord and y_coord respectively denote the x and y image coordinates of the keypoints/joints, and joint_type_class_id is the keypoint/joint integer ID. More details are in the above DATA_README.txt
tracks_normalized.pkl & tracks_normalized_with_person_action_label.pkl: Please refer to https://github.com/hongluzhou/composer/blob/dbe5155391b5a2eea7f4c146192c9957ca323c42/DATA_README.txt#L50

Hope it helps!

Best, Honglu

cissoidx commented 2 years ago

@hongluzhou Thanks very much for your reply. In the first frame of the first video of the collectivity dataset, there are obviously 4 persons. But in the annotation, there are only 3. And these 4 persons are walking. However, in the annotation they are labelled as waiting.

My questions:

would you confirm that this is a mislabel?
would this kind of mislabel (if it is) affect the performance of the model?

cheers, xu

hongluzhou commented 2 years ago

Hi @cissoidx,

The group activity of this clip is 'Walking' and the 3 persons are actually labeled as 'Walking' (as shown in the following screenshot).

Note that 'group activity ID 0' means 'Walking' and 'person action ID 1' means 'Walking' according to the mapping https://github.com/hongluzhou/composer/blob/dbe5155391b5a2eea7f4c146192c9957ca323c42/datasets/collective.py#L137

The reason there are only 3 persons labeled is that we got the person track labels from https://github.com/wjchaoGit/Group-Activity-Recognition/tree/master/data/collective/tracks and in their annotations, only 3 persons have bounding boxes. We map each bounding box to an action label. Therefore, correspondingly only 3 persons have annotated action IDs.

As mentioned in Appendix E in our paper, we do feel labels of both Volleyball and Collective Activity datasets are not perfect. I also raised an issue like this before: https://github.com/mostafa-saad/deep-activity-rec/issues/26. Nevertheless, most labels are fine overall.

Best, Honglu

hongluzhou / composer

about dataset collective activity #2