Negative Indices in Temporal Grounding Information

ChanningPing commented 3 years ago

First of all, thanks for providing the AGQA dataset and the additional meta-data information! I was looking into the grounding information provided in the dataset, as the example below:

Screen Shot 2021-07-16 at 2 33 25 PM (1)

Could you help to clarify some of the questions below: (1) In the example attached below, the key (indices of words in the question) is a negative value (-60--31), what does it mean? We try to count the letters from backwards, but it also doesn't seem to refer to a reasonable phrase. (2) For each key, there is a list of vertices and frame_ids, e.g. ['o10/000360','o10/000383'...], is it true that all the frame_ids in the list belong to 1 segment, or it could be multiple segment, e.g. a person takes a phone at some frames, but put down the phone at some other frames? In other words, can we use the smallest frame_id as the start point, and largest frame_id as the end point in the list to find the duration of the key event? (3) We observe that for many binary questions, there is always a [X-X]key (e.g. "27-27") that takes the entire frame set as the grounding frames. Is this true for all binary questions?

Thanks again for your help!

madeleinegrunde commented 3 years ago

Thank you for your questions.

1) This is a bug for one template and one type of indirect reference. We are in the process of fixing it, and will update the dataset accordingly. I expect this to be finished within the next few days.

2) It is not guaranteed that they are the same segment. The frames that are annotated are based on which frames Action Genome annotated. Action Genome chose 5 uniformly distributed frames for each Charades action.

3) Every question includes a grounding of the relevant frames. However, some questions do not specify a temporal localization within them. When that occurs, we include an [X-X] key that refers to all the frames, since there is no phrase in the question that specifies the relevant video.

jun0wanan commented 3 years ago

Thank you for your questions.

This is a bug for one template and one type of indirect reference. We are in the process of fixing it, and will update the dataset accordingly. I expect this to be finished within the next few days.

It is not guaranteed that they are the same segment. The frames that are annotated are based on which frames Action Genome annotated. Action Genome chose 5 uniformly distributed frames for each Charades action.

Every question includes a grounding of the relevant frames. However, some questions do not specify a temporal localization within them. When that occurs, we include an [X-X] key that refers to all the frames, since there is no phrase in the question that specifies the relevant video.

hi,

  The scene graph pkl can not open .. Maybe because it is too big ?

Best, wishes!

madeleinegrunde commented 3 years ago

Hmm that is strange. Here is code I shared with someone else who had trouble opening the scene graphs, and they said this helped. If this does not work, I can share multiple files with smaller numbers of scene graphs in case the size of the file is too big.

with open('Downloads/AGQA_scene_graphs/AGQA_train_stsgs.pkl', 'rb') as f:
    train_stsgs = pickle.load(f)

with open('Downloads/AGQA_scene_graphs/AGQA_test_stsgs.pkl', 'rb') as f:
    test_stsgs = pickle.load(f)

madeleinegrunde / AGQA_baselines_code

Negative Indices in Temporal Grounding Information #3