yrcong / STTran

Spatial-Temporal Transformer for Dynamic Scene Graph Generation, ICCV2021
MIT License
187 stars 34 forks source link

customized inputs #29

Closed Mingyuan1997 closed 2 years ago

Mingyuan1997 commented 2 years ago

Is there any suggestion on how to run the model on customized input videos?

Thank you!

yrcong commented 2 years ago

Hi, i think it is not difficult to use the model to infer the customized videos. Line 167-178 in https://github.com/yrcong/STTran/blob/main/dataloader/action_genome.py may help you:) best

xiaodanhu commented 2 years ago

If we test on custom video, it seems that the information like attention_relationship and bboxes are required?

Mingyuan1997 commented 2 years ago

I think so.

yrcong commented 2 years ago

If we test on custom video, it seems that the information like attention_relationship and bboxes are required?

Why? The attention relationships should be predicted and the bboxes should be inferred by the object detector.

xiaodanhu commented 2 years ago

If we test on custom video, it seems that the information like attention_relationship and bboxes are required?

Why? The attention relationships should be predicted and the bboxes should be inferred by the object detector.

Thanks for replying! When I checked the dataloader in action_genome.py, the person_bbox and object_bbox were loaded from person_bbox.pkl and object_bbox_and_relationship.pkl, respectively. So I was guessing, if loading the customized video, we also need this information so that we can load it properly. Can you indicate where the model automatically predicts the relationship and bboxes? Thank you very much!

yrcong commented 2 years ago

If you just want to test on your customized video dataset, only the video frames (self.video_list in the class AG) are necessary (for the setting SGDET). Sometimes person_bbox and object_bbox are used in the test because there are other two settings PredCLS/SGCLS which are widely used in the image scene graph generation.