Closed Andy1621 closed 1 year ago
Hi @Andy1621,
We have downsized and downsampled the videos from DATA. For example, for msrvtt, we downsample it to be 2 FPS, and also resize it to have a shorter side size 224. The timed30 means we only keep the first 30 seconds of the video. This is indicated by the DiDeMo dataset paper where the authors only uses the first 30 seconds.
Best, Jie
Get it! Thanks for your help!
Hi! I have noticed that the name for your DiDeMo file is
didemo_2fps_360_trimed30
, while the name for MSRVTT ismsrvtt_2fps_224
. It seems a little different from DATA. I think 360 means the shorter size, what about thetrimed30
? Is there any other preprocessing?