jayleicn / singularity

[ACL 2023] Official PyTorch code for Singularity model in "Revealing Single Frame Bias for Video-and-Language Learning"
https://arxiv.org/abs/2206.03428
MIT License
129 stars 13 forks source link

Data pre-processing #10

Closed yuanze-lin closed 2 years ago

yuanze-lin commented 2 years ago

Hi, dear authors, how do you pre-process DiDeMo dataset?

jayleicn commented 2 years ago

Hi @yzleroy, Please see Section 4.1, where we mentioned

For DiDeMo and ActivityNet Captions, we evaluate paragraph-to- video retrieval [43, 31, 47], where the text captions in the same video are concatenated as a single paragraph-level text for retrieval.

jayleicn commented 2 years ago

trimed30 means we keep only the first 30 seconds of the videos, following the setup in the original didemo paper.

yuanze-lin commented 2 years ago

trimed30 means we keep only the first 30 seconds of the videos, following the setup in the original didemo paper.

Thank you for your kind response !