microsoft / XPretrain

Multi-modality pre-training
Other
467 stars 36 forks source link

Video compression/decoding methods of each dataset in CLIP-ViP #17

Closed fadzaka12 closed 1 year ago

fadzaka12 commented 1 year ago

Hi, I'm trying to reproduce the CLIP-ViP result. In the readme file, it is mentioned that the data preprocessing step follows HD-VILA. However, in the configuration files of the downstream task, it seems the compression/decoding method is different from these. Are these video preprocessing method correct:

HellwayXue commented 1 year ago

Hi, your listed preprocessing methods are right. For DiDeMo, we keep 32 frames for each video thus the fps is variable.