v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
https://v-iashin.github.io/video_features
MIT License
514 stars 96 forks source link

Add the new feature of being able to easily import this repo as a submodule #71

Open Kamino666 opened 2 years ago

Kamino666 commented 2 years ago

For now, we can use this repo to extract features easily and efficiently. Users may first extract features of the videos in the dataset and then use the features as input to train the model. But in the inference phase, it would be nice if we could import this repo as a submodule to extract feature(s) of an individual video and then perform inference.

For example, if I need visual feature and audio feature to perform bi-modal video captioning task, when doing inference, I need to execute the main.py in this repo and then execute the code of my task, which is complex.

What I want to achieve is something like your demo in Colab. The user may clone this repo as a submodule and do from video_features.models.r21d.extract_r21d import ExtractR21D to extract the feature. The major problem now is that the current implementation uses absolute import instead of relative import.

I think this improvement is very beneficial. Maybe it can be released in pypi sometime in the future.

v-iashin commented 2 years ago

Yes, this is very interesting. I will try to do it myself when I will have time.