v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
https://v-iashin.github.io/video_features
MIT License
525 stars 97 forks source link

I3D feature extraction change of size #125

Open KarolyneFarfan opened 6 months ago

KarolyneFarfan commented 6 months ago

Hello, thank you for your work.

I have a question, the output for RGB and flow, extracted separately, ¿could be concatenated to form a vector with size 2048?, as many models for action classification use. ¿Are the outputs processed to perform this operation or there any process you suggest i should follow?

Thank you in advance

v-iashin commented 6 months ago

hi, thanks!

i don't know if concatenation requires the features to be preprocessed somehow differently.

the lib will simply output two tensors: one from flow, the other from RGB. feel free to experiment with it

KarolyneFarfan commented 6 months ago

Thank you for your fast answer, but I do have another question: from which layer are the features extracted, and what is the process you follow afterwards? Sorry for asking so many questions.

v-iashin commented 6 months ago

https://v-iashin.github.io/video_features/models/i3d/

No worries. It is all there.

Alternatively, you may take a look at the source code: https://github.com/v-iashin/video_features/blob/1b67c9f8cfb44b61f6fae5fa1a89d34b7fe7a579/models/i3d/i3d_src/i3d_net.py#L259