Closed yusufani closed 1 year ago
Thanks for your interest. In the original paper, the frames are sampled using a sparse strategy, i.e., the frames are uniformly sampled to capture the global information. In your code snippet, you can control the interval by frame_sample_rate. Hope this can help you.
Thank you for your answer. I tried with a frame for each second and it works quite well 🥳
Pls feel free to ping me if there are further questions.
Hi,
Thank you for X-Clip project. While running the code in Huggingface to try it out, I noticed that 8 frames are sampled sequentially. Do these frames have to be sequential or would it make sense to randomly get 8 frames in 1 second?