v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
https://v-iashin.github.io/video_features
MIT License
493 stars 93 forks source link

Comparison between frame-wise and clip-wise feature extraction in terms of computation time #103

Open nogaini opened 1 year ago

nogaini commented 1 year ago

Hi Vladimir!

Thank you so much for your efforts on this project! This has been really helpful for my research. :)

I have a question not related to this repo, but directly related to the project. I apologize in advance if this is the wrong place to ask.

For a paper I'm currently working on, I wish to make a comparative statement on frame-wise (CLIP, ResNet) VS clip-wise feature extraction (C3D, S3D, S3D) in terms of training and inference time. My intuition and some quick experiments suggest that frame-wise feature extraction is faster for both training and inference, but so far I couldn't find any references to support this. So I just thought I'll check with you as well. Have you come across any references that compare the computation time between frame-wise and clip-wise feature extraction?

Best, Noga

v-iashin commented 1 year ago

Hi, i am glad you like it.

Good question. No, I haven’t seen any.