Closed yuanze-lin closed 2 years ago
Hi @yzleroy, Please see Section 4.1, where we mentioned
For DiDeMo and ActivityNet Captions, we evaluate paragraph-to- video retrieval [43, 31, 47], where the text captions in the same video are concatenated as a single paragraph-level text for retrieval.
trimed30 means we keep only the first 30 seconds of the videos, following the setup in the original didemo paper.
trimed30 means we keep only the first 30 seconds of the videos, following the setup in the original didemo paper.
Thank you for your kind response !
Hi, dear authors, how do you pre-process DiDeMo dataset?