Hi, I'm trying to reproduce the CLIP-ViP result. In the readme file, it is mentioned that the data preprocessing step follows HD-VILA. However, in the configuration files of the downstream task, it seems the compression/decoding method is different from these. Are these video preprocessing method correct:
MSR-VTT: compression, 6 FPS
LSMDC: no compression/decoding, use raw video as is
ActivityNet: decoding lr
DiDeMo: compression, X FPS (What is the number of X? Is it 6 too?)
Hi, I'm trying to reproduce the CLIP-ViP result. In the readme file, it is mentioned that the data preprocessing step follows HD-VILA. However, in the configuration files of the downstream task, it seems the compression/decoding method is different from these. Are these video preprocessing method correct: