v-iashin / video_features

Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, and TIMM models.
https://v-iashin.github.io/video_features
MIT License
532 stars 97 forks source link

how to resize the out features ? #87

Closed misayllk closed 1 year ago

misayllk commented 1 year ago

thanks for your great code! In my recently works,i have to mix the features from different network ,but the out features' sizes were not match. I want to mix the feature from the resnet50 and RAFT(or I3D ) I don't know how to deal with that,could some one help me?😥

v-iashin commented 1 year ago

Hi,

I think you could mix features from frame-wise extractors such as resnet50 and CLIP. I3D features rely on video clips (64 frames processed at once) and, thus, have different temporal dimensions.

Also, note that RAFT extracts a full-resolution frame with optical flow directions.

I will convert it to a discussion as it is not an issue.