Preprocessing for DiDeMo

jayleicn / singularity

[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"

https://arxiv.org/abs/2206.03428

MIT License

129 stars 13 forks source link

Preprocessing for DiDeMo #26

Closed Andy1621 closed 1 year ago

Andy1621 commented 1 year ago

Hi! I have noticed that the name for your DiDeMo file is didemo_2fps_360_trimed30, while the name for MSRVTT ismsrvtt_2fps_224. It seems a little different from DATA. I think 360 means the shorter size, what about the trimed30? Is there any other preprocessing?

jayleicn commented 1 year ago

Hi @Andy1621,

We have downsized and downsampled the videos from DATA. For example, for msrvtt, we downsample it to be 2 FPS, and also resize it to have a shorter side size 224. The timed30 means we only keep the first 30 seconds of the video. This is indicated by the DiDeMo dataset paper where the authors only uses the first 30 seconds.

Best, Jie

Andy1621 commented 1 year ago

Get it! Thanks for your help!