madeleinegrunde / AGQA_baselines_code

MIT License
18 stars 4 forks source link

Different scene graph from Action Genome scene graph #6

Closed Tangolin closed 2 years ago

Tangolin commented 2 years ago

Hi! From the paper it seems that you based your dataset on the Action Genome scene graph, however from the scene graphs downloaded from the google drive, it seems like the structure is rather different from the original action genome scene graph.

May I know what is the reason for the difference in the scene graphs? And how is the scene graph for your dataset structured? Thank you so much for your time :)

madeleinegrunde commented 2 years ago

Hi, thank you for your question.

The structure is quite different from Action Genome. I cover the structure of these scene graphs in the README (at the bottom). We changed the structure to make it easier to use to generate questions. We also augmented the annotations using the strategies outlined in sections 3.1 and 6.2 of our paper. Some of the ways the updated structure helps us are:

1) We made relationships their own nodes in the scene graph, so we could reason about them independently and create questions like "What were they holding?"

2) We added actions as nodes. Since we combined the Charades and Action Genome data, we could not just rely on having object nodes as Action Genome does. Therefore, we also had nodes for Charades action annotations.

3) When answering the questions programmatically, it was helpful to have pointers to the next and previous instances of that particular relationship or object. The original Action Genome dataset does not have those pointers.

I hope this helps, and I can answer any particular questions you have as well.

Tangolin commented 2 years ago

Hi @madeleinegrunde, thank you so much for your explanations! I must have missed the README part of the drive, so sorry about that! I am using the scene graphs for my research so I do have a few specific questions and would be very grateful if you can help answer them!

  1. In the README on google drive, for frame vertex it is mentioned that 'next': next o4 object. Is it suppose to be the next temporal frame vertex instead?

  2. For frame['next'] in the AGQA_test_stsgs.pkl file, is it simply an entry of the next frame_id string e.g. '000105' or is the entire next frame data stored inside the entry? I tried inspecting it as a .yaml file but couldn't tell because of their method of representation.

  3. The paper mentioned that attention relationships were removed, but the README still mentions it under the structure of each vertex, can I just confirm that it has been removed?

  4. The train-balanced-tgif.csv question files used for training has different format for each entry compared to the the train_balanced.txt question file format stated in the README files, what is the reason behind this?

Apologies for asking so many questions at once, thank you for your time!:)

madeleinegrunde commented 2 years ago

Hi, here is my response to your questions.

  1. That is a typo, the 'next' should point to the next frame. I have updated the readme.
  2. It is the entire next data entry. Since the vertices all refer to eachother, it can be difficult (or impossible) to print out. I usually upload it using pickle and then avoid printing out pointers.
  3. We do not use attention relationships in the questions we generate, but they may remain in the scene graphs.
  4. The .csv file is formatted to have the correct headings needed to work with the models. It also does not include much information about each question (for example the program or scene graph grounding). This new format functioned to make it consistent with the data structure originally used for HCRN, HME, and PSAC, without including extra information it would take a while to upload wherever you run your models. The question content, answers, and ids will be the same.

I hope this helps :)

Tangolin commented 2 years ago

I understand the dataset much better now. Thanks a lot!

Tangolin commented 2 years ago

Hi @madeleinegrunde, I realised that the questions in the .csv file and the .txt files are vastly different. It seems that the .txt files only contains the binary questions. In this case, since the csv file was used for training, can I just check where is the extra information on the questions in the csv file(e.g. answer type, semantic, structural) located at?

madeleinegrunde commented 2 years ago

Responded on #7.