rohitgirdhar / CATER

CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning
https://rohitgirdhar.github.io/CATER/
Apache License 2.0
103 stars 19 forks source link

Actions per frame #26

Open Rajawat23 opened 3 years ago

Rajawat23 commented 3 years ago

Hi Authors, Thanks for your work. I tried generating 3 videos to test the dataset. while actions_order_dataset seems to return frame, label and classes, The output file (train.txt) under folder action_order_uniq contains no information about it.

It contains input something like /images/CLEVR_new_000002.avi 53,54,60,69,70,71,72,74,77,78,81,83,129,138,144,153,155,156,157,161,162,165,167,173,179,187,188,195,197,198,200,203,204,207,209,257,263,264,265,270,272,279,281,282,284,287,288,291,292,293,381,382,383,387,389,390,392,396,398,405,407,408,410,411,412,413,414,415,417,419,423,425,430,431,432,434,438,440,447,449,450,452,455,456,459,460,461,465,471,474,480,489,490,491,492,495,497,498,501,502,509,515,518,524,532,533,536,539,545,549,551,555,557,558,560,564,565,566,573,575,576,577,578,580,581,582,585,586,587

How can i get frame by frame actions and classes?

rohitgirdhar commented 3 years ago

Hi, thanks for your interest. The actions_order task is a multi-label classification task where we pre-define a set of action order classes and the list that you see is the indices of classes that are active at some point in the video.

To get actions active at any given frame, you should be able to use the movements metadata, like this.

Ramtin-Nouri commented 1 year ago

Am I right in the assumption that the whole 10s video is the input and the whole list of classes is the label? I.e. the output is a 301 length vector describing whether this class was present at any time in the 10s video.

rohitgirdhar commented 1 year ago

Yes that is correct.